11# Appendix: Macro Follow-Set Ambiguity Formal Specification
22
3+ r[ macro.ambiguity]
4+
35This page documents the formal specification of the follow rules for [ Macros
46By Example] . They were originally specified in [ RFC 550] , from which the bulk
57of this text is copied, and expanded upon in subsequent RFCs.
68
79## Definitions & Conventions
810
11+ r[ macro.ambiguity.convention]
12+
13+ r[ macro.ambiguity.convention.defs]
914 - ` macro ` : anything invokable as ` foo!(...) ` in source code.
1015 - ` MBE ` : macro-by-example, a macro defined by ` macro_rules ` .
1116 - ` matcher ` : the left-hand-side of a rule in a ` macro_rules ` invocation, or a
@@ -46,11 +51,13 @@ macro_rules! i_am_an_mbe {
4651}
4752```
4853
54+ r[ macro.ambiguity.convention.matcher]
4955` (start $foo:expr $($i:ident),* end) ` is a matcher. The whole matcher is a
5056delimited sequence (with open- and close-delimiters ` ( ` and ` ) ` ), and ` $foo `
5157and ` $i ` are simple NT's with ` expr ` and ` ident ` as their respective fragment
5258specifiers.
5359
60+ r[ macro.ambiguity.convention.complex-nt]
5461` $(i:ident),* ` is * also* an NT; it is a complex NT that matches a
5562comma-separated repetition of identifiers. The ` , ` is the separator token for
5663the complex NT; it occurs in between each pair of elements (if any) of the
@@ -65,16 +72,19 @@ token.
6572proper nesting of token tree structure and correct matching of open- and
6673close-delimiters.)
6774
75+ r[ macro.ambiguity.convention.vars]
6876We will tend to use the variable "M" to stand for a matcher, variables "t" and
6977"u" for arbitrary individual tokens, and the variables "tt" and "uu" for
7078arbitrary token trees. (The use of "tt" does present potential ambiguity with
7179its additional role as a fragment specifier; but it will be clear from context
7280which interpretation is meant.)
7381
82+ r[ macro.ambiguity.convention.set]
7483"SEP" will range over separator tokens, "OP" over the repetition operators
7584` * ` , ` + ` , and ` ? ` , "OPEN"/"CLOSE" over matching token pairs surrounding a
7685delimited sequence (e.g. ` [ ` and ` ] ` ).
7786
87+ r[ macro.ambiguity.convention.sequence-vars]
7888Greek letters "α" "β" "γ" "δ" stand for potentially empty token-tree sequences.
7989(However, the Greek letter "ε" (epsilon) has a special role in the presentation
8090and does not stand for a token-tree sequence.)
@@ -101,6 +111,9 @@ purposes of the formalism, we will treat `$v:vis` as actually being
101111
102112### The Matcher Invariants
103113
114+ r[ macro.ambiguity.invariant]
115+
116+ r[ macro.ambiguity.invariant.list]
104117To be valid, a matcher must meet the following three invariants. The definitions
105118of FIRST and FOLLOW are described later.
106119
@@ -112,18 +125,21 @@ of FIRST and FOLLOW are described later.
1121251 . For an unseparated complex NT in a matcher, ` M = ... $(tt ...) OP ... ` , if
113126 OP = ` * ` or ` + ` , we must have FOLLOW(` tt ... ` ) ⊇ FIRST(` tt ... ` ).
114127
128+ r[ macro.ambiguity.invariant.follow-matcher]
115129The first invariant says that whatever actual token that comes after a matcher,
116130if any, must be somewhere in the predetermined follow set. This ensures that a
117131legal macro definition will continue to assign the same determination as to
118132where ` ... tt ` ends and ` uu ... ` begins, even as new syntactic forms are added
119133to the language.
120134
135+ r[ macro.ambiguity.invariant.separated-complex-nt]
121136The second invariant says that a separated complex NT must use a separator token
122137that is part of the predetermined follow set for the internal contents of the
123138NT. This ensures that a legal macro definition will continue to parse an input
124139fragment into the same delimited sequence of ` tt ... ` 's, even as new syntactic
125140forms are added to the language.
126141
142+ r[ macro.ambiguity.invariant.unseparated-complex-nt]
127143The third invariant says that when we have a complex NT that can match two or
128144more copies of the same thing with no separation in between, it must be
129145permissible for them to be placed next to each other as per the first invariant.
@@ -137,6 +153,9 @@ invalid in a future edition of Rust. See the [tracking issue].**
137153
138154### FIRST and FOLLOW, informally
139155
156+ r[ macro.ambiguity.sets]
157+
158+ r[ macro.ambiguity.sets.intro]
140159A given matcher M maps to three sets: FIRST(M), LAST(M) and FOLLOW(M).
141160
142161Each of the three sets is made up of tokens. FIRST(M) and LAST(M) may also
@@ -145,12 +164,15 @@ can match the empty fragment. (But FOLLOW(M) is always just a set of tokens.)
145164
146165Informally:
147166
167+ r[ macro.ambiguity.sets.first]
148168 * FIRST(M): collects the tokens potentially used first when matching a
149169 fragment to M.
150170
171+ r[ macro.ambiguity.sets.last]
151172 * LAST(M): collects the tokens potentially used last when matching a fragment
152173 to M.
153174
175+ r[ macro.ambiguity.sets.follow]
154176 * FOLLOW(M): the set of tokens allowed to follow immediately after some
155177 fragment matched by M.
156178
@@ -163,6 +185,7 @@ Informally:
163185
164186 * The concatenation α β γ δ is a parseable Rust program.
165187
188+ r[ macro.ambiguity.sets.universe]
166189We use the shorthand ANYTOKEN to denote the set of all tokens (including simple
167190NTs). For example, if any token is legal after a matcher M, then FOLLOW(M) =
168191ANYTOKEN.
@@ -174,18 +197,27 @@ definitions.)
174197
175198### FIRST, LAST
176199
200+ r[ macro.ambiguity.sets.def]
201+
202+ r[ macro.ambiguity.sets.def.intro]
177203Below are formal inductive definitions for FIRST and LAST.
178204
205+ r[ macro.ambiguity.sets.def.notation]
179206"A ∪ B" denotes set union, "A ∩ B" denotes set intersection, and "A \ B"
180207denotes set difference (i.e. all elements of A that are not present in B).
181208
182209#### FIRST
183210
211+ r[ macro.ambiguity.sets.def.first]
212+
213+ r[ macro.ambiguity.sets.def.first.intro]
184214FIRST(M) is defined by case analysis on the sequence M and the structure of its
185215first token-tree (if any):
186216
217+ r[ macro.ambiguity.sets.def.first.epsilon]
187218 * if M is the empty sequence, then FIRST(M) = { ε },
188219
220+ r[ macro.ambiguity.sets.def.first.token]
189221 * if M starts with a token t, then FIRST(M) = { t },
190222
191223 (Note: this covers the case where M starts with a delimited token-tree
@@ -195,6 +227,7 @@ first token-tree (if any):
195227 (Note: this critically relies on the property that no simple NT matches the
196228 empty fragment.)
197229
230+ r[ macro.ambiguity.sets.def.first.complex]
198231 * Otherwise, M is a token-tree sequence starting with a complex NT: `M = $( tt
199232 ... ) OP α` , or ` M = $( tt ... ) SEP OP α` , (where ` α` is the (potentially
200233 empty) sequence of token trees for the rest of the matcher).
@@ -229,12 +262,18 @@ with respect to \varepsilon as well.
229262
230263#### LAST
231264
265+ r[ macro.ambiguity.sets.def.last]
266+
267+ r[ macro.ambiguity.sets.def.last.intro]
232268LAST(M), defined by case analysis on M itself (a sequence of token-trees):
233269
270+ r[ macro.ambiguity.sets.def.last.empty]
234271 * if M is the empty sequence, then LAST(M) = { ε }
235272
273+ r[ macro.ambiguity.sets.def.last.token]
236274 * if M is a singleton token t, then LAST(M) = { t }
237275
276+ r[ macro.ambiguity.sets.def.last.rep-star]
238277 * if M is the singleton complex NT repeating zero or more times, `M = $( tt
239278 ... ) * ` , or ` M = $( tt ... ) SEP * `
240279
@@ -245,6 +284,7 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
245284 * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
246285 ...`) ∪ {ε}.
247286
287+ r[ macro.ambiguity.sets.def.last.rep-plus]
248288 * if M is the singleton complex NT repeating one or more times, `M = $( tt ...
249289 ) +` , or ` M = $( tt ... ) SEP +`
250290
@@ -255,12 +295,15 @@ LAST(M), defined by case analysis on M itself (a sequence of token-trees):
255295 * otherwise, the sequence `tt ...` must be non-empty; LAST(M) = LAST(`tt
256296 ...`)
257297
298+ r[ macro.ambiguity.sets.def.last.rep-question]
258299 * if M is the singleton complex NT repeating zero or one time, `M = $( tt ...)
259300 ?` , then LAST(M) = LAST( ` tt ...`) ∪ {ε}.
260301
302+ r[ macro.ambiguity.sets.def.last.delim]
261303 * if M is a delimited token-tree sequence ` OPEN tt ... CLOSE ` , then LAST(M) =
262304 { ` CLOSE ` }.
263305
306+ r[ macro.ambiguity.sets.def.last.sequence]
264307 * if M is a non-empty sequence of token-trees ` tt uu ... ` ,
265308
266309 * If ε ∈ LAST(`uu ...`), then LAST(M) = LAST(`tt`) ∪ (LAST(`uu ...`) \ { ε }).
@@ -320,25 +363,35 @@ Here are similar examples but now for LAST.
320363
321364### FOLLOW(M)
322365
366+ r[ macro.ambiguity.sets.def.follow]
367+
368+ r[ macro.ambiguity.sets.def.follow.intro]
323369Finally, the definition for FOLLOW(M) is built up as follows. pat, expr, etc.
324370represent simple nonterminals with the given fragment specifier.
325371
372+ r[ macro.ambiguity.sets.def.follow.pat]
326373 * FOLLOW(pat) = {` => ` , ` , ` , ` = ` , ` | ` , ` if ` , ` in ` }`.
327374
375+ r[ macro.ambiguity.sets.def.follow.expr-stmt]
328376 * FOLLOW(expr) = FOLLOW(expr_2021) = FOLLOW(stmt) = {` => ` , ` , ` , ` ; ` }`.
329377
378+ r[ macro.ambiguity.sets.def.follow.ty-path]
330379 * FOLLOW(ty) = FOLLOW(path) = {` { ` , ` [ ` , ` , ` , ` => ` , ` : ` , ` = ` , ` > ` , ` >> ` , ` ; ` ,
331380 ` | ` , ` as ` , ` where ` , block nonterminals}.
332381
382+ r[ macro.ambiguity.sets.def.follow.vis]
333383 * FOLLOW(vis) = {` , ` l any keyword or identifier except a non-raw ` priv ` ; any
334384 token that can begin a type; ident, ty, and path nonterminals}.
335385
386+ r[ macro.ambiguity.sets.def.follow.simple]
336387 * FOLLOW(t) = ANYTOKEN for any other simple token, including block, ident,
337388 tt, item, lifetime, literal and meta simple nonterminals, and all terminals.
338389
390+ r[ macro.ambiguity.sets.def.follow.other-matcher]
339391 * FOLLOW(M), for any other M, is defined as the intersection, as t ranges over
340392 (LAST(M) \ {ε}), of FOLLOW(t).
341393
394+ r[ macro.ambiguity.sets.def.follow.type-first]
342395The tokens that can begin a type are, as of this writing, {` ( ` , ` [ ` , ` ! ` , ` * ` ,
343396` & ` , ` && ` , ` ? ` , lifetimes, ` > ` , ` >> ` , ` :: ` , any non-keyword identifier, ` super ` ,
344397` self ` , ` Self ` , ` extern ` , ` crate ` , ` $crate ` , ` _ ` , ` for ` , ` impl ` , ` fn ` , ` unsafe ` ,
0 commit comments