HALICERY

free-time coding, hardware dev, articles

Top
Home 8042 Blogs About
Home IntelEssential 64-bit programming IntelEssential64basics

Last modified: Fri Jun 19 20:06:52 UTC+0200 2026 © A. Tarpai


64-bit mode basics

The basics is about how the CPU works in 64-bit mode, how it interprets code bytes, sometimes with hand-made code to see detailed CPU operation.

Maybe some of these results can be used later – and some do not make any sense.

Assemblers in 64-bit mode have also a hard job to encode what the programmer really wants and what the CPU can exactly do.

No 64-bit mode without PE=1

A hardware requirement. PE-bit must be set: CPU operates in protected mode. All write to SEGREGS (direct or indirect FAR) will cause descriptor fetch from GDTABLE/LDTABLE-s. Interrupts will use IDTR and IDTABLE-s. No more Real mode and exceptions: although it is possible to run 16-bit code. But with PE=1.

No 64-bit mode without paging

A hardware requirement. PAE paging with 64-bit entries. To make this simple, my examples map 4GB address space linearly to physical address. As if programming in 64-bit mode for bare metal with 4GB memory.

No segmentation

Paging can also do the job of address space extension and protection, so this is cut out in 64-bit mode. SEGREGS effectively not used before sending the virtual address to paging tables. Kept only for FS/GS in case.

Operand size in 64-bit mode

As a general rule, kept 32-bit for the same opcodes as default.

64-bit mode honors the old operand-size prefix, 66h: as for the 386 it means 16-bit operation.

For full 64-bit operation there is the REX prefix (sacrificing 16 short form inc/dec register instructions):

  63                                      0
  +----+----+----+----+----+----+----+----+
  |                                       |    48 83 C0 01  add rax, 1
  +----+----+----+----+----+----+----+----+

                                     ^
                                     | 48h REX.W prefix
                                     |

                      31                  0
                      +----+----+----+----+
                      |                   |       83 C0 01  add eax, 1  (DEFAULT opcode)
                      +----+----+----+----+

                                     |
                                     | 66h prefix
                                     v
                                15        0
                                +----+----+
                                |         |    66 83 C0 01  add ax, 1
                                +----+----+

Register addressing in 64-bit mode

Every register is extended to 64-bit.

REG HI zeroed

In 64-bit mode, every move, logical/arithmetic operation, lea, etc. is 32-bit for the same opcodes: the default operand size is 32-bit.

When destination is a register, reg HI zeroed:

  63                                      0
  +----+----+----+----+----+----+----+----+
  | 00000000000000000                     | <-- mov, add, xor, inc..
  +----+----+----+----+----+----+----+----+
                  REGISTER

Eg. no need for REX.W here:

48 33 C0   xor rax, rax
   33 C0   xor eax, eax

gives the same result: rax=0 (save code bytes).

This also means normal 32-bit code runs just fine in 64-bit mode and can make some routines re-usable. Eg. loading and using esi in 32-bit code. Running the same code bytes in 64-bit mode gives identical results, because RSI HI is zeroed:

                  [BITS 32]                     [BITS 64]
BE D2 04 00 00    mov  esi, 4D2h                mov  esi, 4D2h         <-- RSI HI zeroed
8B 06             mov  eax, dword [esi]         mov  eax, dword [rsi]  <-- RSI: 00000000_xxxxxxxx

But just watch out (ml64 MASM syntax):

48 B8 AA AA AA AA AA AA AA AA   mov rax, 0AAAAAAAAAAAAAAAAh  <-- rax = AAAAAAAA_AAAAAAAA
83 C0 01                        add eax, 1                   <-- rax = 00000000_AAAAAAAB

Additional registers R8..R15

AMD added another 8, 64-bit real GP registers. RISC-style naming: r8..r15. These can be handy, it was always difficult with just A, B, C and D on Intel, especially that many instructions use certain registers implied (like stos etc. so they were never real GP registers).

Need REX.B for access.

Assemblers refer r8..r15 for 64-bit access – but similar to the old ones – instructions can also encode for 32/16/8-bit portions using prefixes and opcode bits, eg.:

  63                  31        15   7    0
  +----+----+----+----+----+----+----+----+
  |                   |         |    |    |            66h  REX.B
  +----+----+----+----+----+----+----+----+             |    |
  |                   |         |    |                  |    |
  |                   |         |    +--------- r13b    |    41  80 C5 01  add r13b, 1
  |                   |         |                       |    |
  |                   |         +-------------- r13w    66   41  83 C5 01  add r13w, 1
  |                   |                                 |    |
  |                   +------------------------ r13d*   |    41  83 C5 01  add r13d, 1
  |                                                     |    |
  +-------------------------------------------- r13     |    49  83 C5 01  add r13,  1

          64-bit REGISTER r8..r15


*: the only form zeroing HI

New 8-bit register addressing: SP-LOW, BP-LOW, SI-LOW, DI-LOW

REX will access 8-bit registers of 4 GP registers with the same opcode instead of xH. Note the REX prefix without any bits (40h):

63                  31        15   7    0                           15   7    0
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         | AH | AL |   RAX                     |    | AL |   RAX
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         | BH | BL |   RBX                     |    | BL |   RBX
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         | CH | CL |   RCX      40h            |    | CL |   RCX
+----+----+----+----+----+----+----+----+           ------->    ----+----+----+
|                   |         | DH | DL |   RDX      REX            |    | DL |   RDX
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         |         |   RSI                     |    | SIL|   RSI
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         |         |   RDI                     |    | DIL|   RDI
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         |         |   RBP                     |    | BPL|   RBP
+----+----+----+----+----+----+----+----+                       ----+----+----+
|                   |         |         |   RSP                     |    | SPL|   RSP
+----+----+----+----+----+----+----+----+                       ----+----+----+

I have no idea why, maybe they tried to move towards orthogonal RISC-type register usage, easier to optimize code compilation.

The 40h REX works for all other registers too - but redundant. See move.

Can be also handy, to save code-bytes, eg. align RSP to 16 (note that 64-bit immediate is not supported in 64-bit mode). Consider:

48 81 E4 F0 00 00 00    and rsp, 0F0h
40 80 E4 F0             and spl, 0F0h

RFLAGS

Simply EFLAGS with HI zero (eg. on stack). Bits 63..32 is still Reserved for all CPU-s.

  63                                      0
  +----+----+----+----+----+----+----+----+
  |         0         |       EFLAGS      |
  +----+----+----+----+----+----+----+----+
                   RFLAGS

Memory addressing modes 64

Address calculation is 64-bit using 64-bit base, index registers.

But displacement bytes following opcode has not been extended to 64-bit. NB: it is still 1- or 4 bytes of displacement and is always sign-extended before using in address calculations. This is true for 64-bit SIB Direct Memory too (see below):

64-bit mode EA calculation

  63                                      0
  +----+----+----+----+----+----+----+----+
  |                                       | BASE REG
  +----+----+----+----+----+----+----+----+
  |                                       | SCALED INDEX REG
  +----+----+----+----+----+----+----+----+
  | sssssssssssss <-- |s         <-- |s   | 8/32-BIT SIGNED DISPLACEMENT
  +----+----+----+----+----+----+----+----+
_______________________________________________________________________
  +----+----+----+----+----+----+----+----+
  |                                       | EA       Address-size=64 (default)
  +----+----+----+----+----+----+----+----+

                      |
                      v
  +----+----+----+----+----+----+----+----+
  | 00000000000000000                     | EA       Address-size=32 (67h)
  +----+----+----+----+----+----+----+----+

8086 legacy 16-bit addressing modes are not possible in 64-bit mode.

Prefix 67h will zero EA HI. Eg.:

48 B8 FF FF FF FF 0E 00 00 00      mov  rax, 0EFFFFFFFFh  ; rax= 0000000e_ffffffff
BE 80 00 00 00                     mov  esi, 80h          ; rsi= 00000000_00000080

      8D 1C 30   lea  ebx, [rax+rsi]  <-- ebx= 00000000_0000007f   Def operand size=32, def address calculation=64
   48 8D 1C 30   lea  rbx, [rax+rsi]  <-- ebx= 0000000f_0000007f   64-bit move and 64-bit address calculation
   67 8D 1C 30   lea  ebx, [eax+esi]  <-- ebx= 00000000_0000007f   address calculation=32: HI ZERO
67 48 8D 1C 30   lea  rbx, [eax+esi]  <-- ebx= 00000000_0000007f   address calculation=32: HI ZERO

67h note.

I just cannot reverse-engineer whether the addition uses LO only with carry zeroed – or full register addition, then zero HI. This is binary addition: the result is the same.

RIP-relative

In 64-bit mode, one addressing mode is hijacked for RIP-relative: MOD=00 EBP (101=5), the Direct Memory.

RIP uses a 32-bit signed displacement: relative to next instruction byte.

RIP-REL EA calculation:


  63                  31                  0
  +----+----+----+----+----+----+----+----+
  | sssssssssssss <-- |s                  | 32-BIT SIGNED DISPLACEMENT
  +----+----+----+----+----+----+----+----+
  |            RIP of next instr          |
  +----+----+----+----+----+----+----+----+
_______________________________________________________________________
  63                                      0
  +----+----+----+----+----+----+----+----+
  |                                       | EA     Address-size=64 ONLY
  +----+----+----+----+----+----+----+----+

Same opcodes, only that disp32 means RIP + disp32 = RIP-relative +/- 2GB. In 64-bit mode, disp32 is always sign-extended before using in address calculations anyway, and will be similar to the old JMP/CALL-relative.

67h has NO effect, test with lea:

   48 8D 05 5B 00 00 00   lea rax, [sym]    <-- rax= 00007FF7_CC2A107A
67 48 8D 05 5B 00 00 00   lea rax, [sym]    <-- rax= 00007FF7_CC2A107A (67h has no effect)

Note: ml64 MASM encodes this with RIP (sym was defined in the same section).
NASM syntax for rip: lea rax, [rel sym].

64-bit SIB Direct Memory addressing mode

MODRM + SIB, the redundant SIB Direct Memory still can be used for absolute offset.

Direct Memory addressing is when there is only a single, immediate offset follows and this is the EA. On 8086 it is a WORD, for 386 this is a DWORD. In 64-bit it is still DWORD, but there is a tricky part and can result in surprises: the offset is 32-bit, sign-extended to 64-bit:

Direct memory EA calculation:


  63                  31                  0
  +----+----+----+----+----+----+----+----+
  | sssssssssssss <-- |s                  | 32-BIT SIGNED DISPLACEMENT
  +----+----+----+----+----+----+----+----+
_______________________________________________________________________
  63                                      0
  +----+----+----+----+----+----+----+----+
  |                                       | EA     Address-size=64 (default)
  +----+----+----+----+----+----+----+----+

                      |
                      v
  +----+----+----+----+----+----+----+----+
  | 00000000000000000                     | EA     Address-size=32 (67h)
  +----+----+----+----+----+----+----+----+

Eg. to issue APIC EOI mov [0xFEE000B0], 0. In 64-bit mode this writes to FFFFFFFF_FEE000B0! Needs 67h to chop off EA HI:

NASM:

[BITS 64]
              31 D2        xor edx, edx
   89 1425 B000E0FE        mov [0xFEE000B0], edx   ; WRONG*:  writes to FFFFFFFF_FEE000B0
67 89 1425 B000E0FE    a32 mov [0xFEE000B0], edx   ; CORRECT: writes to 00000000_FEE000B0

* NASM gives also warning: dword data exceeds bounds. See test in bootblkbin_int64apic.asm.

Another solution is to move target address to register first:

ml64 MASM syntax:

BA B0 00 E0 FE      mov edx, 0FEE000B0h       ; rdx = 00000000_FEE000B0
33 C0               xor eax, eax
89 02               mov dword ptr [rdx], eax

Another solution is to use the only instruction move that supports a 64-bit offset: the good ol' 8086 A0..A3 move accu:

ml64 MASM syntax:

A3 B0 00 E0 FE 00 00 00 00     mov dword ptr [0FEE000B0h],eax

Test with LEA [-5] and prefixes (these uses SIB Direct Memory):

      8D 1C 25 FB FF FF FF    lea  ebx, [-5]  <-- rbx= 00000000_FFFFFFFB
   48 8D 1C 25 FB FF FF FF    lea  rbx, [-5]  <-- rbx= FFFFFFFF_FFFFFFFB  sign-extended
67 48 8D 1C 25 FB FF FF FF    lea  rbx, [-5]  <-- rbx= 00000000_FFFFFFFB  hand-made code, no idea of syntax