1313
1414Syntactic structure of a regular expression
1515
16- Regex -> '' | Alternation
16+ Regex -> GlobalMatchingOptionSequence? RegexNode
17+ RegexNode -> '' | Alternation
1718 Alternation -> Concatenation ('|' Concatenation)*
1819 Concatenation -> (!'|' !')' ConcatComponent)*
1920 ConcatComponent -> Trivia | Quote | Quantification
2021 Quantification -> QuantOperand Quantifier?
21- QuantOperand -> Group | CustomCharClass | Atom
22- Group -> GroupStart Regex ')'
22+ QuantOperand -> Conditional | Group | CustomCharClass
23+ | Atom | AbsentFunction
24+
25+ Conditional -> CondStart Concatenation ('|' Concatenation)? ')'
26+ CondStart -> KnownCondStart | GroupCondStart
27+
28+ Group -> GroupStart RegexNode ')'
2329
2430Custom character classes are a mini-language to their own. We
2531support UTS#18 set operators and nested character classes. The
2632meaning of some atoms, such as `\b` changes inside a custom
27- chararacter class. Below, we have a grammar "scope", that is we say
28- "SetOp" to mean "CustomCharactetClass.SetOp", so we don't have to
29- abbreviate/obfuscate/disambiguate with ugly names like "CCCSetOp".
33+ chararacter class. Below, we have a grammar "scope", that is we
34+ say "SetOp" to mean "CustomCharactetClass.SetOp", so we don't
35+ have to abbreviate/obfuscate/disambiguate with ugly names like
36+ "CCCSetOp".
3037
3138Also, PCRE lets you end in `&&`, but not Oniguruma as it's a set
32- operator. We probably want a rule similar to how you can end in `-`
33- and that's just the character. Perhaps we also have syntax options
34- in case we need a compatibilty mode (it's easy to add here and now)
39+ operator. We probably want a rule similar to how you can end in
40+ `-` and that's just the character. Perhaps we also have syntax
41+ options in case we need a compatibilty mode (it's easy to add
42+ here and now)
3543
3644 CustomCharClass -> Start Set (SetOp Set)* ']'
3745 Set -> Member+
@@ -46,6 +54,9 @@ Lexical analysis provides the following:
4654 Quantifier -> `lexQuantifier`
4755 GroupStart -> `lexGroupStart`
4856
57+ GroupCondStart -> `lexGroupConditionalStart`
58+ KnownCondStart -> `lexKnownCondition`
59+
4960 CustomCharacterClass.Start -> `lexCustomCCStart`
5061 CustomCharacterClass.SetOp -> `lexCustomCCBinOp`
5162
@@ -353,9 +364,9 @@ extension Parser {
353364 ///
354365 /// QuantOperand -> Conditional | Group | CustomCharClass | Atom
355366 /// | AbsentFunction
356- /// Group -> GroupStart RecursiveRegex ')'
357- /// Conditional -> ConditionalStart Concatenation ('|' Concatenation)? ')'
358- /// ConditionalStart -> KnownConditionalStart | GroupConditionalStart
367+ /// Group -> GroupStart RegexNode ')'
368+ /// Conditional -> CondStart Concatenation ('|' Concatenation)? ')'
369+ /// CondStart -> KnownCondStart | GroupCondStart
359370 ///
360371 mutating func parseQuantifierOperand( ) throws -> AST . Node ? {
361372 assert ( !source. isEmpty)
0 commit comments