Last modified: Fri Jun 19 20:06:52 UTC+0200 2026 © A. Tarpai
64-bit mode basics
The basics is about how the CPU works in 64-bit mode, how it interprets code bytes, sometimes with hand-made code to see detailed CPU operation.
Maybe some of these results can be used later – and some do not make any sense.
Assemblers in 64-bit mode have also a hard job to encode what the programmer really wants and what the CPU can exactly do.
No 64-bit mode without PE=1
A hardware requirement. PE-bit must be set: CPU operates in protected mode. All write to SEGREGS (direct or indirect FAR) will cause descriptor fetch from GDTABLE/LDTABLE-s. Interrupts will use IDTR and IDTABLE-s. No more Real mode and exceptions: although it is possible to run 16-bit code. But with PE=1.
No 64-bit mode without paging
A hardware requirement. PAE paging with 64-bit entries. To make this simple, my examples map 4GB address space linearly to physical address. As if programming in 64-bit mode for bare metal with 4GB memory.
No segmentation
Paging can also do the job of address space extension and protection, so this is cut out in 64-bit mode. SEGREGS effectively not used before sending the virtual address to paging tables. Kept only for FS/GS in case.
Operand size in 64-bit mode
As a general rule, kept 32-bit for the same opcodes as default.
64-bit mode honors the old operand-size prefix, 66h: as for the 386 it means 16-bit operation.
For full 64-bit operation there is the REX prefix (sacrificing 16 short form inc/dec register instructions):
63 0
+----+----+----+----+----+----+----+----+
| | 48 83 C0 01 add rax, 1
+----+----+----+----+----+----+----+----+
^
| 48h REX.W prefix
|
31 0
+----+----+----+----+
| | 83 C0 01 add eax, 1 (DEFAULT opcode)
+----+----+----+----+
|
| 66h prefix
v
15 0
+----+----+
| | 66 83 C0 01 add ax, 1
+----+----+
Register addressing in 64-bit mode
Every register is extended to 64-bit.
REG HI zeroed
In 64-bit mode, every move, logical/arithmetic operation, lea, etc. is 32-bit for the same opcodes: the default operand size is 32-bit.
When destination is a register, reg HI zeroed:
63 0
+----+----+----+----+----+----+----+----+
| 00000000000000000 | <-- mov, add, xor, inc..
+----+----+----+----+----+----+----+----+
REGISTER
Eg. no need for REX.W here:
48 33 C0 xor rax, rax 33 C0 xor eax, eax
gives the same result: rax=0 (save code bytes).
This also means normal 32-bit code runs just fine in 64-bit mode and can make some routines re-usable. Eg. loading and using esi in 32-bit code. Running the same code bytes in 64-bit mode gives identical results, because RSI HI is zeroed:
[BITS 32] [BITS 64]
BE D2 04 00 00 mov esi, 4D2h mov esi, 4D2h <-- RSI HI zeroed
8B 06 mov eax, dword [esi] mov eax, dword [rsi] <-- RSI: 00000000_xxxxxxxx
But just watch out (ml64 MASM syntax):
48 B8 AA AA AA AA AA AA AA AA mov rax, 0AAAAAAAAAAAAAAAAh <-- rax = AAAAAAAA_AAAAAAAA 83 C0 01 add eax, 1 <-- rax = 00000000_AAAAAAAB
Additional registers R8..R15
AMD added another 8, 64-bit real GP registers. RISC-style naming: r8..r15. These can be handy, it was always difficult with just A, B, C and D on Intel, especially that many instructions use certain registers implied (like stos etc. so they were never real GP registers).
Need REX.B for access.
Assemblers refer r8..r15 for 64-bit access – but similar to the old ones – instructions can also encode for 32/16/8-bit portions using prefixes and opcode bits, eg.:
63 31 15 7 0
+----+----+----+----+----+----+----+----+
| | | | | 66h REX.B
+----+----+----+----+----+----+----+----+ | |
| | | | | |
| | | +--------- r13b | 41 80 C5 01 add r13b, 1
| | | | |
| | +-------------- r13w 66 41 83 C5 01 add r13w, 1
| | | |
| +------------------------ r13d* | 41 83 C5 01 add r13d, 1
| | |
+-------------------------------------------- r13 | 49 83 C5 01 add r13, 1
64-bit REGISTER r8..r15
*: the only form zeroing HI
New 8-bit register addressing: SP-LOW, BP-LOW, SI-LOW, DI-LOW
REX will access 8-bit registers of 4 GP registers with the same opcode instead of xH. Note the REX prefix without any bits (40h):
63 31 15 7 0 15 7 0 +----+----+----+----+----+----+----+----+ ----+----+----+ | | | AH | AL | RAX | | AL | RAX +----+----+----+----+----+----+----+----+ ----+----+----+ | | | BH | BL | RBX | | BL | RBX +----+----+----+----+----+----+----+----+ ----+----+----+ | | | CH | CL | RCX 40h | | CL | RCX +----+----+----+----+----+----+----+----+ -------> ----+----+----+ | | | DH | DL | RDX REX | | DL | RDX +----+----+----+----+----+----+----+----+ ----+----+----+ | | | | RSI | | SIL| RSI +----+----+----+----+----+----+----+----+ ----+----+----+ | | | | RDI | | DIL| RDI +----+----+----+----+----+----+----+----+ ----+----+----+ | | | | RBP | | BPL| RBP +----+----+----+----+----+----+----+----+ ----+----+----+ | | | | RSP | | SPL| RSP +----+----+----+----+----+----+----+----+ ----+----+----+
I have no idea why, maybe they tried to move towards orthogonal RISC-type register usage, easier to optimize code compilation.
The 40h REX works for all other registers too - but redundant. See move.
Can be also handy, to save code-bytes, eg. align RSP to 16 (note that 64-bit immediate is not supported in 64-bit mode). Consider:
48 81 E4 F0 00 00 00 and rsp, 0F0h 40 80 E4 F0 and spl, 0F0h
RFLAGS
Simply EFLAGS with HI zero (eg. on stack). Bits 63..32 is still Reserved for all CPU-s.
63 0
+----+----+----+----+----+----+----+----+
| 0 | EFLAGS |
+----+----+----+----+----+----+----+----+
RFLAGS
Memory addressing modes 64
Address calculation is 64-bit using 64-bit base, index registers.
But displacement bytes following opcode has not been extended to 64-bit. NB: it is still 1- or 4 bytes of displacement and is always sign-extended before using in address calculations. This is true for 64-bit SIB Direct Memory too (see below):
64-bit mode EA calculation
63 0
+----+----+----+----+----+----+----+----+
| | BASE REG
+----+----+----+----+----+----+----+----+
| | SCALED INDEX REG
+----+----+----+----+----+----+----+----+
| sssssssssssss <-- |s <-- |s | 8/32-BIT SIGNED DISPLACEMENT
+----+----+----+----+----+----+----+----+
_______________________________________________________________________
+----+----+----+----+----+----+----+----+
| | EA Address-size=64 (default)
+----+----+----+----+----+----+----+----+
|
v
+----+----+----+----+----+----+----+----+
| 00000000000000000 | EA Address-size=32 (67h)
+----+----+----+----+----+----+----+----+
8086 legacy 16-bit addressing modes are not possible in 64-bit mode.
Prefix 67h will zero EA HI. Eg.:
48 B8 FF FF FF FF 0E 00 00 00 mov rax, 0EFFFFFFFFh ; rax= 0000000e_ffffffff
BE 80 00 00 00 mov esi, 80h ; rsi= 00000000_00000080
8D 1C 30 lea ebx, [rax+rsi] <-- ebx= 00000000_0000007f Def operand size=32, def address calculation=64
48 8D 1C 30 lea rbx, [rax+rsi] <-- ebx= 0000000f_0000007f 64-bit move and 64-bit address calculation
67 8D 1C 30 lea ebx, [eax+esi] <-- ebx= 00000000_0000007f address calculation=32: HI ZERO
67 48 8D 1C 30 lea rbx, [eax+esi] <-- ebx= 00000000_0000007f address calculation=32: HI ZERO
67h note.
I just cannot reverse-engineer whether the addition uses LO only with carry zeroed – or full register addition, then zero HI. This is binary addition: the result is the same.
RIP-relative
In 64-bit mode, one addressing mode is hijacked for RIP-relative: MOD=00 EBP (101=5), the Direct Memory.
RIP uses a 32-bit signed displacement: relative to next instruction byte.
RIP-REL EA calculation: 63 31 0 +----+----+----+----+----+----+----+----+ | sssssssssssss <-- |s | 32-BIT SIGNED DISPLACEMENT +----+----+----+----+----+----+----+----+ | RIP of next instr | +----+----+----+----+----+----+----+----+ _______________________________________________________________________ 63 0 +----+----+----+----+----+----+----+----+ | | EA Address-size=64 ONLY +----+----+----+----+----+----+----+----+
Same opcodes, only that disp32 means RIP + disp32 = RIP-relative +/- 2GB. In 64-bit mode, disp32 is always sign-extended before using in address calculations anyway, and will be similar to the old JMP/CALL-relative.
67h has NO effect, test with lea:
48 8D 05 5B 00 00 00 lea rax, [sym] <-- rax= 00007FF7_CC2A107A 67 48 8D 05 5B 00 00 00 lea rax, [sym] <-- rax= 00007FF7_CC2A107A (67h has no effect)
Note: ml64 MASM encodes this with RIP (sym was defined in the same section).
NASM syntax for rip: lea rax, [rel sym].
64-bit SIB Direct Memory addressing mode
MODRM + SIB, the redundant SIB Direct Memory still can be used for absolute offset.
Direct Memory addressing is when there is only a single, immediate offset follows and this is the EA. On 8086 it is a WORD, for 386 this is a DWORD. In 64-bit it is still DWORD, but there is a tricky part and can result in surprises: the offset is 32-bit, sign-extended to 64-bit:
Direct memory EA calculation:
63 31 0
+----+----+----+----+----+----+----+----+
| sssssssssssss <-- |s | 32-BIT SIGNED DISPLACEMENT
+----+----+----+----+----+----+----+----+
_______________________________________________________________________
63 0
+----+----+----+----+----+----+----+----+
| | EA Address-size=64 (default)
+----+----+----+----+----+----+----+----+
|
v
+----+----+----+----+----+----+----+----+
| 00000000000000000 | EA Address-size=32 (67h)
+----+----+----+----+----+----+----+----+
Eg. to issue APIC EOI mov [0xFEE000B0], 0. In 64-bit mode this writes to FFFFFFFF_FEE000B0! Needs 67h to chop off EA HI:
NASM:
[BITS 64]
31 D2 xor edx, edx
89 1425 B000E0FE mov [0xFEE000B0], edx ; WRONG*: writes to FFFFFFFF_FEE000B0
67 89 1425 B000E0FE a32 mov [0xFEE000B0], edx ; CORRECT: writes to 00000000_FEE000B0
* NASM gives also warning: dword data exceeds bounds. See test in bootblkbin_int64apic.asm.
Another solution is to move target address to register first:
ml64 MASM syntax: BA B0 00 E0 FE mov edx, 0FEE000B0h ; rdx = 00000000_FEE000B0 33 C0 xor eax, eax 89 02 mov dword ptr [rdx], eax
Another solution is to use the only instruction move that supports a 64-bit offset: the good ol' 8086 A0..A3 move accu:
ml64 MASM syntax: A3 B0 00 E0 FE 00 00 00 00 mov dword ptr [0FEE000B0h],eax
Test with LEA [-5] and prefixes (these uses SIB Direct Memory):
8D 1C 25 FB FF FF FF lea ebx, [-5] <-- rbx= 00000000_FFFFFFFB
48 8D 1C 25 FB FF FF FF lea rbx, [-5] <-- rbx= FFFFFFFF_FFFFFFFB sign-extended
67 48 8D 1C 25 FB FF FF FF lea rbx, [-5] <-- rbx= 00000000_FFFFFFFB hand-made code, no idea of syntax