added support for flz (find last zero) and copied bitfield functions from PR #565
refactor for separation of clz software/hardware based