Skip to content

Commit 587ed5e

Browse files
author
H. Peter Anvin (Intel)
committed
x86/bytecode.txt: improve byte code documentation
Improve the byte code reference documentation to make a few opcodes more clear and add some general properties about the byte codes, including the files that need to be changed when the byte code changes. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
1 parent e9fac2f commit 587ed5e

File tree

1 file changed

+83
-29
lines changed

1 file changed

+83
-29
lines changed

x86/bytecode.txt

Lines changed: 83 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
-*- text -*-
2+
13
Bytecode specification
24
----------------------
35

@@ -9,31 +11,72 @@ hexadecimal.
911

1012
The mnemonics are the ones used in x86/insns.dat, where applicable.
1113

14+
The byte code is not stable. Byte codes can be moved around and
15+
recycled at any time. x86/insnsb.c contains a generated table of
16+
byte code use frequencies as a comment near the end that can be
17+
used to identify candidates for recycling, if necessary.
18+
19+
Several byte codes are equivalent to sequences of other byte codes; if
20+
those have low usage counts they can be good candidates for
21+
recycling.
22+
23+
Operands are numbered starting with 0.
24+
25+
Operand numbers encoded in byte codes only encode two bits of the
26+
operand number, with the opcodes \5, \6 and \7 used as a prefixes to
27+
escape to operands 4+. This saves a lot of byte coding space, as these
28+
operands are extremely rare.
29+
30+
When byte codes are changed, the following files MUST be updated
31+
accordingly:
32+
33+
this file
34+
x86/insns.pl - many locations
35+
disasm/disasm.c - matches()
36+
asm/assemble.c - calcsize(), gencode(), find_match(), jmp_match()
37+
1238
In x86/insns.dat, the encoding slot of each operand is encoded as:
1339

1440
- implicit operand (no encoding)
1541
x+y multiple encoding slots for one operand
16-
r "r" position in modr/m, or base register with "+r"
42+
r "r" position in modr/m[1], or base register with "+r"[2]
1743
m "m" position in modr/m
18-
n immediate encoded in the "m" position in modr/m
19-
b register encoded in the "m" position in modr/m
44+
n immediate encoded in the "m" position in modr/m[3]
45+
b register encoded in the "m" position in modr/m[4]
2046
x register encoded in the "x" position in modr/m + sib (MIB)
2147
v "v" register position in vex/evex
22-
s "s" registe rposition in /is4
23-
w immediate encoded in the "v" position in vex/evex
24-
i first immediate or mem_offs
25-
j second immediate or mem_offs
26-
27-
Codes Mnemonic Explanation
28-
29-
\0 terminates the code. (Unless it's a literal of course.)
30-
\1..\4 that many literal bytes follow in the code stream
31-
\5 add 4 to the primary operand number (b, low octdigit)
32-
\6 add 4 to the secondary operand number (a, middle octdigit)
33-
\7 add 4 to both the primary and the secondary operand number
34-
\10..\13 a literal byte follows in the code stream, to be added
48+
s "s" register position in /is4
49+
w immediate encoded in the "v" position in vex/evex[3]
50+
i first immediate or mem_offs[5]
51+
j second immediate or mem_offs[6]
52+
53+
[1] currently used even for register operands, even though "b" is an
54+
alias in that case.
55+
[2] this is technically incorrect and should be "b", but that is the
56+
way it is currently encoded.
57+
[3] separate letter code for the benefit of the insns.pl sanity checker.
58+
[4] currently used mainly when "x" is also used.
59+
[5] when the modr/m displacement is used as an immediate, it is byte
60+
coded as an *address-sized* immediate and uses "i". A seg:offs
61+
pair uses "i" for the offset (thus "ji").
62+
[6] when the modr/m displacement is used as an immediate and
63+
another ("true") immediate is present, the "true" immediate uses "j".
64+
A seg:offs pair uses "j" for the segment (thus "ji").
65+
66+
67+
XX below indicates a hexadecimal byte; NN a decimal number.
68+
69+
Codes Mnemonic Definition
70+
71+
\0 (auto-generated) end of code sequence (but 0 can be part of a multi-byte
72+
sequence, so byte codes are NOT null-terminated strings.)
73+
\1..\4 XX XX... that many literal bytes follow in the code stream
74+
\5 (auto-generated) add 4 to the primary operand number (b, low octdigit)
75+
\6 (auto-generated) add 4 to the secondary operand number (a, middle octdigit)
76+
\7 (auto-generated) add 4 to both the primary and the secondary operand number
77+
\10..\13 +r a literal byte follows in the code stream, to be added
3578
to the register value of operand 0..3
36-
\14..\17 the position of index register operand in MIB (BND insns)
79+
\14..\17 (auto-generated) the position of index register operand in MIB (BND insns)
3780
\20..\23 ib a byte immediate operand, from operand 0..3
3881
\24..\27 ib,u a zero-extended byte immediate operand, from operand 0..3
3982
\30..\33 iw a word immediate operand, from operand 0..3
@@ -54,17 +97,20 @@ Codes Mnemonic Explanation
5497
\171\mab /mrb (e.g /3r0) a ModRM, with the reg field taken from operand a, and the m
5598
and b fields set to the specified values.
5699
\172\ab /is4 the register number from operand a in bits 7..4, with
57-
the 4-bit immediate from operand b in bits 3..0.
58-
\173\xab the register number from operand a in bits 7..4, with
100+
the 4-bit immediate from operand b in bits 2..0.
101+
For EVEX- or REX2-encodable instructions, the operand is encoded in
102+
bits [3:7..4] and the immediate is restricted to 3 bits
103+
unless the register operand is given the rn_l16 operand flag.
104+
\173\xab /is4=NN the register number from operand a in bits 7..4, with
59105
the value b in bits 3..0.
60-
\174..\177 the register number from operand 0..3 in bits 7..4, and
106+
\174..\177 /is4 the register number from operand 0..3 in bits 7..4, and
61107
an arbitrary value in bits 3..0 (assembled as zero.)
62108
\2ab /b a ModRM, calculated on EA in operand a, with the reg
63109
field equal to digit b.
64-
\240..\243 this instruction uses EVEX rather than REX or VEX/XOP, with the
110+
\240..\243 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
65111
V register number taken from operand "b" (0..3) (which may
66112
be an immediate, as is used for DFV.)
67-
\250 this instruction uses EVEX rather than REX or VEX/XOP, with the
113+
\250 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
68114
V register number set to 0 (subject to the XOR as defined
69115
below)
70116

@@ -88,10 +134,10 @@ EVEX prefixes are followed by the sequence:
88134
(compressed displacement encoding)
89135

90136
\254..\257 id,s a signed 32-bit operand to be extended to 64 bits.
91-
\260..\263 this instruction uses VEX/XOP rather than REX, with the
137+
\260..\263 vex.* this instruction uses VEX/XOP rather than REX, with the
92138
V register taken from operand "b" 0..3.
93139
\264..\267 id,u an unsigned 32-bit operand to be extended to 64 bits.
94-
\270 this instruction uses VEX/XOP rather than REX, with the
140+
\270 vex.* this instruction uses VEX/XOP rather than REX, with the
95141
V register set to 0.
96142
VEX/XOP prefixes are followed by the sequence:
97143
\tmm\wlp tmm format: tt 0mm mmm
@@ -112,16 +158,20 @@ VEX/XOP prefixes are followed by the sequence:
112158

113159
t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
114160

115-
\271 hlex instruction takes XRELEASE (F3) with or without lock
161+
vex+.* instruction is encodable either with VEX or EVEX,
162+
depending on the operands. Generates multiple
163+
instruction patterns with different operand encoding
164+
and byte codes.
165+
\271 hlex instruction takes XRELEASE (F3) with or without lock
116166
\272 hlenl instruction takes XACQUIRE/XRELEASE with or without lock
117167
\273 hle instruction takes XACQUIRE/XRELEASE with lock only
118168
\274..\277 ib,s a byte immediate operand, from operand 0..3, sign-extended
119169
to the operand size (if o16/o32/o64 present) or the bit size
120170
\300..\303 ibn a valid 0F NOP opcode.
121-
\304..\307
122-
\0\xNN ib^NN intermediate byte XOR 0xNN
123-
\1\xNN ib,s^NN signed intermediate byte XOR 0xNN
124-
\2\xNN ib,u^NN unsigned intermediate byte XOR 0xNN
171+
\304..\307 a byte immediate from operand 0..3, XOR a specific constant.
172+
\0\xXX ib^XX intermediate byte XOR 0xXX
173+
\1\xXX ib,s^XX signed intermediate byte XOR 0xXX
174+
\2\xXX ib,u^XX unsigned intermediate byte XOR 0xXX
125175
\310 a16 indicates fixed 16-bit address size, i.e. optional 0x67.
126176
\311 a32 indicates fixed 32-bit address size, i.e. optional 0x67.
127177
\312 adf, asz (disassembler only) invalid with non-default address size.
@@ -185,3 +235,7 @@ t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
185235
\376 vsibz|vm32z|vm64z this instruction takes an ZMM VSIB memory EA
186236

187237
* No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp.
238+
239+
## Local variables:
240+
## fill-column: 99
241+
## End

0 commit comments

Comments
 (0)