1+ -*- text -*-
2+
13Bytecode specification
24----------------------
35
@@ -9,31 +11,72 @@ hexadecimal.
911
1012The mnemonics are the ones used in x86/insns.dat, where applicable.
1113
14+ The byte code is not stable. Byte codes can be moved around and
15+ recycled at any time. x86/insnsb.c contains a generated table of
16+ byte code use frequencies as a comment near the end that can be
17+ used to identify candidates for recycling, if necessary.
18+
19+ Several byte codes are equivalent to sequences of other byte codes; if
20+ those have low usage counts they can be good candidates for
21+ recycling.
22+
23+ Operands are numbered starting with 0.
24+
25+ Operand numbers encoded in byte codes only encode two bits of the
26+ operand number, with the opcodes \5, \6 and \7 used as a prefixes to
27+ escape to operands 4+. This saves a lot of byte coding space, as these
28+ operands are extremely rare.
29+
30+ When byte codes are changed, the following files MUST be updated
31+ accordingly:
32+
33+ this file
34+ x86/insns.pl - many locations
35+ disasm/disasm.c - matches()
36+ asm/assemble.c - calcsize(), gencode(), find_match(), jmp_match()
37+
1238In x86/insns.dat, the encoding slot of each operand is encoded as:
1339
1440 - implicit operand (no encoding)
1541 x+y multiple encoding slots for one operand
16- r "r" position in modr/m, or base register with "+r"
42+ r "r" position in modr/m[1] , or base register with "+r"[2]
1743 m "m" position in modr/m
18- n immediate encoded in the "m" position in modr/m
19- b register encoded in the "m" position in modr/m
44+ n immediate encoded in the "m" position in modr/m[3]
45+ b register encoded in the "m" position in modr/m[4]
2046 x register encoded in the "x" position in modr/m + sib (MIB)
2147 v "v" register position in vex/evex
22- s "s" registe rposition in /is4
23- w immediate encoded in the "v" position in vex/evex
24- i first immediate or mem_offs
25- j second immediate or mem_offs
26-
27- Codes Mnemonic Explanation
28-
29- \0 terminates the code. (Unless it's a literal of course.)
30- \1..\4 that many literal bytes follow in the code stream
31- \5 add 4 to the primary operand number (b, low octdigit)
32- \6 add 4 to the secondary operand number (a, middle octdigit)
33- \7 add 4 to both the primary and the secondary operand number
34- \10..\13 a literal byte follows in the code stream, to be added
48+ s "s" register position in /is4
49+ w immediate encoded in the "v" position in vex/evex[3]
50+ i first immediate or mem_offs[5]
51+ j second immediate or mem_offs[6]
52+
53+ [1] currently used even for register operands, even though "b" is an
54+ alias in that case.
55+ [2] this is technically incorrect and should be "b", but that is the
56+ way it is currently encoded.
57+ [3] separate letter code for the benefit of the insns.pl sanity checker.
58+ [4] currently used mainly when "x" is also used.
59+ [5] when the modr/m displacement is used as an immediate, it is byte
60+ coded as an *address-sized* immediate and uses "i". A seg:offs
61+ pair uses "i" for the offset (thus "ji").
62+ [6] when the modr/m displacement is used as an immediate and
63+ another ("true") immediate is present, the "true" immediate uses "j".
64+ A seg:offs pair uses "j" for the segment (thus "ji").
65+
66+
67+ XX below indicates a hexadecimal byte; NN a decimal number.
68+
69+ Codes Mnemonic Definition
70+
71+ \0 (auto-generated) end of code sequence (but 0 can be part of a multi-byte
72+ sequence, so byte codes are NOT null-terminated strings.)
73+ \1..\4 XX XX... that many literal bytes follow in the code stream
74+ \5 (auto-generated) add 4 to the primary operand number (b, low octdigit)
75+ \6 (auto-generated) add 4 to the secondary operand number (a, middle octdigit)
76+ \7 (auto-generated) add 4 to both the primary and the secondary operand number
77+ \10..\13 +r a literal byte follows in the code stream, to be added
3578 to the register value of operand 0..3
36- \14..\17 the position of index register operand in MIB (BND insns)
79+ \14..\17 (auto-generated) the position of index register operand in MIB (BND insns)
3780\20..\23 ib a byte immediate operand, from operand 0..3
3881\24..\27 ib,u a zero-extended byte immediate operand, from operand 0..3
3982\30..\33 iw a word immediate operand, from operand 0..3
@@ -54,17 +97,20 @@ Codes Mnemonic Explanation
5497\171\mab /mrb (e.g /3r0) a ModRM, with the reg field taken from operand a, and the m
5598 and b fields set to the specified values.
5699\172\ab /is4 the register number from operand a in bits 7..4, with
57- the 4-bit immediate from operand b in bits 3..0.
58- \173\xab the register number from operand a in bits 7..4, with
100+ the 4-bit immediate from operand b in bits 2..0.
101+ For EVEX- or REX2-encodable instructions, the operand is encoded in
102+ bits [3:7..4] and the immediate is restricted to 3 bits
103+ unless the register operand is given the rn_l16 operand flag.
104+ \173\xab /is4=NN the register number from operand a in bits 7..4, with
59105 the value b in bits 3..0.
60- \174..\177 the register number from operand 0..3 in bits 7..4, and
106+ \174..\177 /is4 the register number from operand 0..3 in bits 7..4, and
61107 an arbitrary value in bits 3..0 (assembled as zero.)
62108\2ab /b a ModRM, calculated on EA in operand a, with the reg
63109 field equal to digit b.
64- \240..\243 this instruction uses EVEX rather than REX or VEX/XOP, with the
110+ \240..\243 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
65111 V register number taken from operand "b" (0..3) (which may
66112 be an immediate, as is used for DFV.)
67- \250 this instruction uses EVEX rather than REX or VEX/XOP, with the
113+ \250 evex.* this instruction uses EVEX rather than REX or VEX/XOP, with the
68114 V register number set to 0 (subject to the XOR as defined
69115 below)
70116
@@ -88,10 +134,10 @@ EVEX prefixes are followed by the sequence:
88134 (compressed displacement encoding)
89135
90136\254..\257 id,s a signed 32-bit operand to be extended to 64 bits.
91- \260..\263 this instruction uses VEX/XOP rather than REX, with the
137+ \260..\263 vex.* this instruction uses VEX/XOP rather than REX, with the
92138 V register taken from operand "b" 0..3.
93139\264..\267 id,u an unsigned 32-bit operand to be extended to 64 bits.
94- \270 this instruction uses VEX/XOP rather than REX, with the
140+ \270 vex.* this instruction uses VEX/XOP rather than REX, with the
95141 V register set to 0.
96142VEX/XOP prefixes are followed by the sequence:
97143\tmm\wlp tmm format: tt 0mm mmm
@@ -112,16 +158,20 @@ VEX/XOP prefixes are followed by the sequence:
112158
113159t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
114160
115- \271 hlex instruction takes XRELEASE (F3) with or without lock
161+ vex+.* instruction is encodable either with VEX or EVEX,
162+ depending on the operands. Generates multiple
163+ instruction patterns with different operand encoding
164+ and byte codes.
165+ \271 hlex instruction takes XRELEASE (F3) with or without lock
116166\272 hlenl instruction takes XACQUIRE/XRELEASE with or without lock
117167\273 hle instruction takes XACQUIRE/XRELEASE with lock only
118168\274..\277 ib,s a byte immediate operand, from operand 0..3, sign-extended
119169 to the operand size (if o16/o32/o64 present) or the bit size
120170\300..\303 ibn a valid 0F NOP opcode.
121- \304..\307
122- \0\xNN ib^NN intermediate byte XOR 0xNN
123- \1\xNN ib,s^NN signed intermediate byte XOR 0xNN
124- \2\xNN ib,u^NN unsigned intermediate byte XOR 0xNN
171+ \304..\307 a byte immediate from operand 0..3, XOR a specific constant.
172+ \0\xXX ib^XX intermediate byte XOR 0xXX
173+ \1\xXX ib,s^XX signed intermediate byte XOR 0xXX
174+ \2\xXX ib,u^XX unsigned intermediate byte XOR 0xXX
125175\310 a16 indicates fixed 16-bit address size, i.e. optional 0x67.
126176\311 a32 indicates fixed 32-bit address size, i.e. optional 0x67.
127177\312 adf, asz (disassembler only) invalid with non-default address size.
@@ -185,3 +235,7 @@ t = 0 for VEX (C4/C5), t = 1 for XOP (8F).
185235\376 vsibz|vm32z|vm64z this instruction takes an ZMM VSIB memory EA
186236
187237* No 66 prefix is emitted if combined with VEX/EVEX, np, 66, osp or !osp.
238+
239+ ## Local variables:
240+ ## fill-column: 99
241+ ## End
0 commit comments