Last modified: Thu Jun 18 14:38:53 UTC+0200 2026 © A. Tarpai
SHIFT and ROTATE
8086 SHIFT and ROTATE
8086 could shift/rotate BYTE or WORD REG/MEM operand by 1 or count in CL. One opcode:
1 1 0 1 0 0 C W MOD TTT R/M <-- Shift/Rotate register or memory TTT Instruction W=0: rotate byte 000 ROL W=1: rotate word 001 ROR 010 RCL C=0: Shift/rotate count is one 011 RCR C=1: Shift/rotate count is specified in CL register 100 SHL/SAL 101 SHR 110 - 111 SAR
Operation:
Logical- and Arithmetic shift:
+---+-----------------+ +----+ +-----------------+---+
| 0 --> SHR | ---> | CF | <--- | SHL <-- 0 |
+---+-----------------+ +----+ +-----------------+---+
logical shift
+---+-----------------+ +----+ +-----------------+---+
| s --> SAR | ---> | CF | <--- | SAL <-- 0 |
+---+-----------------+ +----+ +-----------------+---+
arithmetic shift
Either n+1 bit Rotate Through Carry (RCL/RCR):
+---------------------+ +----+ +---------------------+
+--> | RCR | ---> | CF | <--- | RCL | <--+
| +---------------------+ +----+ +---------------------+ |
| | | |
+----------------<-----------------+ +-------------------->-------------+
Or Carry gets a copy of the rotated bit (ROL/ROR):
+---------------------+ +----+ +---------------------+
+--> | ROR | ---> | CF | <--- | ROL | <--+
| +---------------------+ | +----+ | +---------------------+ |
| | | |
+----------------<------------+ +--------------->-------------+
All involves the carry bit (CF). CF contains the last bit shifted out.
Rotates also affect the overflow flag:
In single-bit rotates, OF is set if the operation changes the high-order (sign) bit of the destination
operand (XOR). If the sign bit retains its original value, OF is cleared. On multibit rotates,
the value of OF is always undefined.
Notes:
- The 8086 did not mask rotation count (waste of execution time and delayed interrupt response)
- From 286 all count is masked to 5 lsb bits: 0..31
"The iAPX 286 masks all shift/rotate counts to the low 5 bits. This MOD 32 operation limits the count to a maximum of 31 bits. With this change, the longest shift/rotate instruction is 39 clocks. Without this change, the longest shift/rotate instruction would be 264 clocks, which delays interrupt response until the instruction completes execution."
186/286 SHIFT and ROTATE
Added new opcode (C0/C1) to shift BYTE or WORD R/M by immediate value following opcode.
186/286 added by Immediate Count: 1 1 0 0 0 0 0 W MOD TTT R/M IMM8 TTT: are all the same as 8086
The count is masked.
386 SHIFT and ROTATE
Same old opcodes
1 1 0 1 0 0 C W MOD TTT R/M <-- Shift/Rotate register or memory by 1 or CL MOD 32 1 1 0 0 0 0 0 W MOD TTT R/M IMM8 <-- Shift/Rotate register or memory by IMM8 MOD 32
Operand-size (when W=1) determines 16- or 32-bit R/M shift/rotate. In case of register operand HI is unchanged, when operand-size = 16:
mov eax, 0x0001_aaaa rcr ax, 1 eax = 0x0001_5555
All count is masked to 5 bits (CL or IMM8).
CF flag contains the value of the last bit shifted out. But for SHL and SHR, using operand-size = 16 it can occur that count >= OperandSize (count is always 5 bits). In this case CF is undefined.
386 SHxD Double Shift
386 added Double Shift, a two-operand shift operation. The new thing is that a source register provides the bits shifted into R/M. This source operand remains unchanged. Note:
- count is in CL or IMM8 following opcode
- cannot shift BYTE operand
- no short shift by 1 opcode
- all shift count is MOD 32
SHRD r/m, reg, count
+-----------------+ +-----+ +-----------------+ +----+
| | ---> | tmp | ---> | - - - - -> | ---> | CF |
+-----------------+ +-----+ +-----------------+ +----+
REG R/M
SHLD r/m, reg, count
+----+ +-----------------+ +-----+ +-----------------+
| CF | <--- | <- - - - - - | <--- | tmp | <--- | |
+----+ +-----------------+ +-----+ +-----------------+
R/M REG
tmp: allows overlapped operands
REG register (source) remains unaltered
CF is set to the last bit shifted out
OF is set to last sign-change (on my AuthenticAMD 00810F10)
If the count operand is 0, the flags are not affected.
If count >= OperandSize, R/M and all flags UNDEFINED
Eg. to assist BitBLT emulation.
0F 2-byte opcodes:
0F 1 0 1 0 D 1 0 C MOD REG R/M [IMM8]
C=0: Shift count is IMM8
C=1: Shift count is specified in CL register
D=0: SHLD - Shift Left Double
D=1: SHRD - Shift Right Double
Shifts operand-size values:
operand-size = 32 operand-size = 16
D=1 or D=0 and 66h D=0 or D=1 and 66h
SHxD r/m32, r32, imm8 SHxD r/m16, r16, imm8
SHxD r/m32, r32, CL SHxD r/m16, r16, CL
Using operand-size = 16 it can occur that count >= OperandSize (count is always 5 bits). In this case:
- destination R/M is undefined
- CF, OF, SF, ZF, AF, PF are undefined
Multi-bit rotates and the OF-flag
Well, Docs say undefined but I was wondering.. lets see what really happens. Note my CPU: this is not tested on other CPU-s.
Testbench: execute ROL/ROR/RCL/RCR/SHL/SHR/SAR and SHxD by CL, where CL=0..31, and catch OF.
Result: OF is set according to the last shift/rotate step - at least on my CPU. Gives possibility to XOR any adjacent bits in R/M – see below.
CPU test for multi-bit rotates and the OF-flag
Byte rotate. AL=30h. Shift/rotate 0..31 times:
ROR CL: ROL CL: SHL/SAL CL: RCR CL (CLC): RCL CL (CLC): 30 -> 30 OF=u (0) 30 -> 30 OF=u (0) 30 -> 30 OF=u (0) 30 -> 30 OF=u (0) 30 -> 30 OF=u (0) 30 -> 18 OF=0 (1) 30 -> 60 OF=0 (1) 30 -> 60 OF=0 (1) 30 -> 18 OF=0 (1) 30 -> 60 OF=0 (1) 30 -> 0c OF=0 (2) 30 -> c0 OF=1 (2) 30 -> c0 OF=1 (2) 30 -> 0c OF=0 (2) 30 -> c0 OF=1 (2) 30 -> 06 OF=0 (3) 30 -> 81 OF=0 (3) 30 -> 80 OF=0 (3) 30 -> 06 OF=0 (3) 30 -> 80 OF=0 (3) 30 -> 03 OF=0 (4) 30 -> 03 OF=1 (4) 30 -> 00 OF=1 (4) 30 -> 03 OF=0 (4) 30 -> 01 OF=1 (4) 30 -> 81 OF=1 (5) 30 -> 06 OF=0 (5) 30 -> 00 OF=0 (5) 30 -> 01 OF=0 (5) 30 -> 03 OF=0 (5) 30 -> c0 OF=0 (6) 30 -> 0c OF=0 (6) 30 -> 00 OF=0 (6) 30 -> 80 OF=1 (6) 30 -> 06 OF=0 (6) 30 -> 60 OF=1 (7) 30 -> 18 OF=0 (7) 30 -> 00 OF=0 (7) 30 -> c0 OF=0 (7) 30 -> 0c OF=0 (7) 30 -> 30 OF=0 (8) 30 -> 30 OF=0 (8) 30 -> ... 30 -> 60 OF=1 (8) 30 -> 18 OF=0 (8) 30 -> 18 OF=0 (9) 30 -> 60 OF=0 (9) 30 -> 30 -> 30 OF=0 (9) 30 -> 30 OF=0 (9) 30 -> 0c OF=0 (10) 30 -> c0 OF=1 (10) 30 -> 30 -> 18 OF=0 (10) 30 -> 60 OF=0 (10) 30 -> 06 OF=0 (11) 30 -> 81 OF=0 (11) 30 -> 30 -> 0c OF=0 (11) 30 -> c0 OF=1 (11) 30 -> 03 OF=0 (12) 30 -> 03 OF=1 (12) 30 -> 30 -> 06 OF=0 (12) 30 -> 80 OF=0 (12) 30 -> 81 OF=1 (13) 30 -> 06 OF=0 (13) 30 -> 30 -> 03 OF=0 (13) 30 -> 01 OF=1 (13) 30 -> c0 OF=0 (14) 30 -> 0c OF=0 (14) 30 -> 30 -> 01 OF=0 (14) 30 -> 03 OF=0 (14) 30 -> 60 OF=1 (15) 30 -> 18 OF=0 (15) 30 -> 30 -> 80 OF=1 (15) 30 -> 06 OF=0 (15) 30 -> 30 OF=0 (16) 30 -> 30 OF=0 (16) 30 -> 30 -> c0 OF=0 (16) 30 -> 0c OF=0 (16) 30 -> 18 OF=0 (17) 30 -> 60 OF=0 (17) 30 -> 30 -> 60 OF=1 (17) 30 -> 18 OF=0 (17) 30 -> 0c OF=0 (18) 30 -> c0 OF=1 (18) 30 -> 30 -> 30 OF=0 (18) 30 -> 30 OF=0 (18) 30 -> 06 OF=0 (19) 30 -> 81 OF=0 (19) 30 -> 30 -> 18 OF=0 (19) 30 -> 60 OF=0 (19) 30 -> 03 OF=0 (20) 30 -> 03 OF=1 (20) 30 -> 30 -> 0c OF=0 (20) 30 -> c0 OF=1 (20) 30 -> 81 OF=1 (21) 30 -> 06 OF=0 (21) 30 -> 30 -> 06 OF=0 (21) 30 -> 80 OF=0 (21) 30 -> c0 OF=0 (22) 30 -> 0c OF=0 (22) 30 -> 30 -> 03 OF=0 (22) 30 -> 01 OF=1 (22) 30 -> 60 OF=1 (23) 30 -> 18 OF=0 (23) 30 -> 30 -> 01 OF=0 (23) 30 -> 03 OF=0 (23) 30 -> 30 OF=0 (24) 30 -> 30 OF=0 (24) 30 -> 30 -> 80 OF=1 (24) 30 -> 06 OF=0 (24) 30 -> 18 OF=0 (25) 30 -> 60 OF=0 (25) 30 -> 30 -> c0 OF=0 (25) 30 -> 0c OF=0 (25) 30 -> 0c OF=0 (26) 30 -> c0 OF=1 (26) 30 -> 30 -> 60 OF=1 (26) 30 -> 18 OF=0 (26) 30 -> 06 OF=0 (27) 30 -> 81 OF=0 (27) 30 -> 30 -> 30 OF=0 (27) 30 -> 30 OF=0 (27) 30 -> 03 OF=0 (28) 30 -> 03 OF=1 (28) 30 -> 30 -> 18 OF=0 (28) 30 -> 60 OF=0 (28) 30 -> 81 OF=1 (29) 30 -> 06 OF=0 (29) 30 -> 30 -> 0c OF=0 (29) 30 -> c0 OF=1 (29) 30 -> c0 OF=0 (30) 30 -> 0c OF=0 (30) 30 -> 30 -> 06 OF=0 (30) 30 -> 80 OF=0 (30) 30 -> 60 OF=1 (31) 30 -> 18 OF=0 (31) 30 -> 30 -> 03 OF=0 (31) 30 -> 01 OF=1 (31) u: unchanged on zero count SAR: all cases OF=0 (msb never changes, that is the point) SHR: OF works for single-bit shift - but further shift will keep sign zero, OF=0
CPU test for multi-bit double-shift and the OF-flag
Here src=1 and dest is zero. SHRD 0..31 times:
AuthenticAMD 00810F10
AuthenticAMD 00810F10 SHRD eax, edx, cl 00000000 -> 00000000 OF=u CF=u (0) u: unchanged 00000000 -> 80000000 OF=1 CF=0 (1) 00000000 -> 40000000 OF=1 CF=0 (2) 00000000 -> 20000000 OF=0 CF=0 (3) 00000000 -> 10000000 OF=0 CF=0 (4) 00000000 -> 08000000 OF=0 CF=0 (5) 00000000 -> 04000000 OF=0 CF=0 (6) 00000000 -> 02000000 OF=0 CF=0 (7) 00000000 -> 01000000 OF=0 CF=0 (8) 00000000 -> 00800000 OF=0 CF=0 (9) 00000000 -> 00400000 OF=0 CF=0 (10) 00000000 -> 00200000 OF=0 CF=0 (11) 00000000 -> 00100000 OF=0 CF=0 (12) 00000000 -> 00080000 OF=0 CF=0 (13) 00000000 -> 00040000 OF=0 CF=0 (14) 00000000 -> 00020000 OF=0 CF=0 (15) 00000000 -> 00010000 OF=0 CF=0 (16) 00000000 -> 00008000 OF=0 CF=0 (17) 00000000 -> 00004000 OF=0 CF=0 (18) 00000000 -> 00002000 OF=0 CF=0 (19) 00000000 -> 00001000 OF=0 CF=0 (20) 00000000 -> 00000800 OF=0 CF=0 (21) 00000000 -> 00000400 OF=0 CF=0 (22) 00000000 -> 00000200 OF=0 CF=0 (23) 00000000 -> 00000100 OF=0 CF=0 (24) 00000000 -> 00000080 OF=0 CF=0 (25) 00000000 -> 00000040 OF=0 CF=0 (26) 00000000 -> 00000020 OF=0 CF=0 (27) 00000000 -> 00000010 OF=0 CF=0 (28) 00000000 -> 00000008 OF=0 CF=0 (29) 00000000 -> 00000004 OF=0 CF=0 (30) 00000000 -> 00000002 OF=0 CF=0 (31)
CPU tests for count >= OperandSize
CPU test for count >= OperandSize: SHRD
Here src=1 and dest is zero. SHRD 0..31 times:
It kinda rotates on my CPU, not undefined.
Also possible to rotate 16 into R/M.
AuthenticAMD 00810F10 SHRD ax, dx, cl 00000000 -> 00000000 OF=u CF=u (0) u: unchanged 00000000 -> 00008000 OF=1 CF=0 (1) 00000000 -> 00004000 OF=1 CF=0 (2) 00000000 -> 00002000 OF=0 CF=0 (3) 00000000 -> 00001000 OF=0 CF=0 (4) 00000000 -> 00000800 OF=0 CF=0 (5) 00000000 -> 00000400 OF=0 CF=0 (6) 00000000 -> 00000200 OF=0 CF=0 (7) 00000000 -> 00000100 OF=0 CF=0 (8) 00000000 -> 00000080 OF=0 CF=0 (9) 00000000 -> 00000040 OF=0 CF=0 (10) 00000000 -> 00000020 OF=0 CF=0 (11) 00000000 -> 00000010 OF=0 CF=0 (12) 00000000 -> 00000008 OF=0 CF=0 (13) 00000000 -> 00000004 OF=0 CF=0 (14) 00000000 -> 00000002 OF=0 CF=0 (15) 00000000 -> 00000001 OF=0 CF=0 (16) 00000000 -> 00008000 OF=1 CF=0 (17) 00000000 -> 00004000 OF=1 CF=0 (18) 00000000 -> 00002000 OF=0 CF=0 (19) 00000000 -> 00001000 OF=0 CF=0 (20) 00000000 -> 00000800 OF=0 CF=0 (21) 00000000 -> 00000400 OF=0 CF=0 (22) 00000000 -> 00000200 OF=0 CF=0 (23) 00000000 -> 00000100 OF=0 CF=0 (24) 00000000 -> 00000080 OF=0 CF=0 (25) 00000000 -> 00000040 OF=0 CF=0 (26) 00000000 -> 00000020 OF=0 CF=0 (27) 00000000 -> 00000010 OF=0 CF=0 (28) 00000000 -> 00000008 OF=0 CF=0 (29) 00000000 -> 00000004 OF=0 CF=0 (30) 00000000 -> 00000002 OF=0 CF=0 (31)
CPU test for count >= OperandSize: ROR
Here ax=0x0010 and rotate 0..31 times:
AuthenticAMD 00810F10 ror ax, cl 00000010 -> 00000010 OF=u CF=u (0) u: unchanged 00000010 -> 00000008 OF=0 CF=0 (1) 00000010 -> 00000004 OF=0 CF=0 (2) 00000010 -> 00000002 OF=0 CF=0 (3) 00000010 -> 00000001 OF=0 CF=0 (4) 00000010 -> 00008000 OF=1 CF=1 (5) 00000010 -> 00004000 OF=1 CF=0 (6) 00000010 -> 00002000 OF=0 CF=0 (7) 00000010 -> 00001000 OF=0 CF=0 (8) 00000010 -> 00000800 OF=0 CF=0 (9) 00000010 -> 00000400 OF=0 CF=0 (10) 00000010 -> 00000200 OF=0 CF=0 (11) 00000010 -> 00000100 OF=0 CF=0 (12) 00000010 -> 00000080 OF=0 CF=0 (13) 00000010 -> 00000040 OF=0 CF=0 (14) 00000010 -> 00000020 OF=0 CF=0 (15) 00000010 -> 00000010 OF=0 CF=0 (16) 00000010 -> 00000008 OF=0 CF=0 (17) 00000010 -> 00000004 OF=0 CF=0 (18) 00000010 -> 00000002 OF=0 CF=0 (19) 00000010 -> 00000001 OF=0 CF=0 (20) 00000010 -> 00008000 OF=1 CF=1 (21) 00000010 -> 00004000 OF=1 CF=0 (22) 00000010 -> 00002000 OF=0 CF=0 (23) 00000010 -> 00001000 OF=0 CF=0 (24) 00000010 -> 00000800 OF=0 CF=0 (25) 00000010 -> 00000400 OF=0 CF=0 (26) 00000010 -> 00000200 OF=0 CF=0 (27) 00000010 -> 00000100 OF=0 CF=0 (28) 00000010 -> 00000080 OF=0 CF=0 (29) 00000010 -> 00000040 OF=0 CF=0 (30) 00000010 -> 00000020 OF=0 CF=0 (31)
Nicely rotates a 16-bit register, all values and flags properly set.
XOR-ing bits by shift instructions
The OF-flag is set, when the operation changes the high-order (sign) bit of the destination operand.
This is XOR-ing and maybe it can be useful in some situations.
Where bits coming from?
The complete fig for each shift-type where bits coming from (and possibility for an XOR-test with OF):
(SHRD) REG MSB LSB REG (SHLD)
(RCR) CF +---+--------- ---------+---+ CF (RCL)
(ROR) LSB ---> | R/M | <--- MSB (ROL)
(SHR) 0 +---+--------- ---------+---+ 0 (SHL/SAL)
(SAR) MSB |
| |
+---- XOR ---+
|
OF
Single-bit rotates and the OF-flag for XOR-ing
Full analysis in case some of these might be useful.
Right single-bit rotate/shift
Lets look at what is the source of the new msb in right rotates/shifts:
After operations
updating SF
SHRD: REG-LSB MSB OF = MSB ^ REG-LSB OF = SF ^ REG-LSB
RCR: CF +---+--------- OF = MSB ^ CF OF = SF ^ CF
ROR: LSB ---> | OF = MSB ^ LSB OF = SF ^ LSB
SHR: 0 +---+--------- OF = MSB OF = SF
SAR MSB | OF = 0 OF = 0
| |
+---- XOR ---+
|
OF
RCR 1: OF is XOR of original carry and the MSB ROR 1: We can XOR msb- and lsb-bits
+---+ +---+-----------------+ +---+-------------+---+
| C | ---> | a | | a b |
+---+ +---+-----------------+ +---+-------------+---+
+---+---+-------------+ +---+-------------+---+ +---+
| C a | | b a | ---> | b |
+---+---+-------------+ +---+-------------+---+ +---+
| | | | CF
+ XOR + +------ XOR ------+
| |
OF = CF ^ MSB OF = a ^ b
SHR 1: OF is original MSB SAR 1: OF is always zero
+---+-----------------+ +---+-----------------+
| a | | a |
+---+-----------------+ +---+-----------------+
+---+---+-------------+ +---+ +---+---+-------------+ +---+
| 0 a | ---> |LSB| | a a | ---> |LSB|
+---+---+-------------+ +---+ +---+---+-------------+ +---+
| | CF | | CF
+ XOR + + XOR +
| |
OF = a OF = 0
SHDR 1: XOR two operands' lsb- and msb-bit:
REG src R/M dest
LSB MSB
----+---+ +---+-----------
| a | ---> | b SHRD R/M, REG, 1
----+---+ +---+-----------
| |
+-- XOR --+
|
OF
Left single-bit rotate/shift
All left shifts OF = MSB ^ MSB-1:
After operations
updating SF
All left shifts OF = MSB ^ MSB-1 OF = SF ^ MSB-1
We can XOR the two msb-bits by eg. SHL 1
+---+---+-------------+
| a b |
+---------------------+
+---+ +---+-----------------+
| a | <--- | b |
+---+ +---+-----------------+
| |
+-- XOR --+
|
OF
The OF flag is affected only on 1-bit shifts. For left shifts, the OF flag is set to 0 if the most-significant bit of the result is the same as the CF flag (that is, the top two bits of the original operand were the same); otherwise, it is set to 1.
XOR-ing FLAG bits
7 6 5 4 3 2 1 0 +----+----+----+----+----+----+----+----+ | SF | ZF | | AF | | PF | | CF | +----+----+----+----+----+----+----+----+ Using lahf and AH: lahf lahf lahf lahf rcr ah, 1 ror ah, 1 rol ah, 1 shl ah, 1 OF = SF ^ CF OF = SF ^ CF OF = SF ^ CF OF = SF ^ ZF Preserves CF Preserves CF CF = SF CF = SF
Same result for WORD or DWORD rotates in AX/EAX.
So... maybe ROR is our friend here. Should work with all operand sizes.
Eg. XOR these bits: 3 and 4
000ab000
0000ab00 1
00000ab0 2
000000ab 3
b000000a 4
ab000000 5 ← here XOR
So: XOR bits n and n+1 = ROR n+2
+---+---+---+---+---+---+---+---+
| | | | | | | | |
+---+---+---+---+---+---+---+---+
For the SAR instruction, the OF flag is cleared for all 1-bit shifts.
For the SHR instruction, the OF flag is set to the most-significant bit of the original operand.
+---+-----------------+
| a SHR | We can test the original msb-bit after SHR
+---+-----------------+
|
OF
+---+-----+---+-------+ +---+
| 0 --> a | ---> | C |
+---+-----+---+-------+ +---+
16: just rotated and rotated, all values and flags properly set