ARM64's MOVN instruction is clever
The bit twiddling office called. They want a function to compute a sequence of ARM64 instructions that fills a register with an arbitrary constant 64-bit number.
Zeros in positive number
All ARM64 instructions are 32 bits long, so one instruction can’t possibly
handle all 64-bit patterns. One straightforward way to assemble the desired bit
pattern is to use
MOVK
which sets any one of the four 16-bit chunks of the register without touching
any other chunks. So, four MOVK
instructions can assemble any 64-bit pattern.
Adding MOVZ
to the mix improves this strategy. Small numbers have many zeros
in the pattern, so we’d rather start by zeroing the whole register than
zeroing out one 16-bit chunk at a time with MOVK
. In one go, MOVZ
sets one
16-bit chunk to the desired value and also zeros out all other chunks. For
small numbers like 7, all we need is one MOVZ
.
The negative ones
Another instruction in the same family is
MOVN
.
It sets the register like MOVZ
, but with all bits inverted. So, MOVN
sets
the three chunks that are not directly encoded in the instruction to all ones.
Why would you want so many ones? Negative numbers!
The most significant bit of a two’s complement number contributes negatively to the value tally unlike the rest of the bits. Adding a leading one to the bit string makes the old most significant bit at position K go from having a value of -2K to 2K, adding 2•2K=2K+1 to the balance. The new leading one at K+1 has value -2K+1 , though, so adding a leading one preserves the numeric value of the encoded number. This is one way to see why -1 is encoded as all bits ones. Start with a single one and add leading ones.
Leading ones are to negative numbers what leading zeros are to positive numbers. 1
Strategy using MOVN
Using MOVN
and MOVK
gives a pretty good strategy for negative numbers.
Sometimes, the whole sequence takes fewer than 64 bits. That’s shorter
than movabs
from x86-64, which uses a 64-bit immediate.
For the constant N
, pick a 16-bit chunk in that is not all bits set and use a
MOVN
to set the chunk, filling all other chunks with ones as well. If no such
chunk exists, then all bits are set, and we’re done with a single MOVN
.
Otherwise, set other chunks of N
with MOVK
.
If there is even a single 16-bit chunk of ones in N
, the initial MOVN
finishes setting two chunks or more in one go, beating out only using MOVK
.
MOVZ
covers positive numbers and MOVN
covers the negatives.
This strategy is not limited to negative numbers. For example, it
sets 0x7000_ffff_cafe_ffff
with just two instructions.
Play with it!
On a browser with WASM support, there should be an interactive toy below that
implements the MOVN
strategy. It only uses MOVN
and MOVK
, so
don’t expect the shortest possible sequence for all inputs. (For one, it won’t attempt to use
a bitmask
immediate.)
It’s made with capstone-rs
compiled through
wasm32-unknown-emscripten
.
Please excuse the huge 3MB download size – I made no effort minimizing it.
-
It follows that
typeof(n)::BITS - n.leading_zeros() - n.leading_ones() + 1
gives the minimum number of bits it takes to encoden
. One of the terms is always zero. ↩︎