You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<li><strong><code>s</code></strong> — Dot matches all (aka <em>singleline</em> mode) — <em>Added as a native flag in ES2018</em></li>
55
-
<li><strong><code>x</code></strong> — Free-spacing and line comments (aka <em>extended</em> mode)</li>
56
-
<li><strong><code>A</code></strong> — Astral (requires the Unicode Base addon)</li>
53
+
<li><strong><code>n</code></strong> — Named capture only</li>
54
+
<li><strong><code>s</code></strong> — Dot matches all (<em>singleline</em>) — <em>Added as a native flag in ES2018, but XRegExp always supports it</em></li>
55
+
<li><strong><code>x</code></strong> — Free-spacing and line comments (<em>extended</em>)</li>
56
+
<li><strong><code>A</code></strong> — 21-bit Unicode properties (<em>astral</em>) — <em>Requires the Unicode Base addon</em></li>
57
57
</ul>
58
58
</li>
59
59
<li><strong>Native flags</strong>
60
60
<ul>
61
61
<li><strong><code>g</code></strong> — All matches, or advance <code>lastIndex</code> after matches (<code>global</code>)</li>
62
62
<li><strong><code>i</code></strong> — Case insensitive (<code>ignoreCase</code>)</li>
63
63
<li><strong><code>m</code></strong> — <code>^</code> and <code>$</code> match at newlines (<code>multiline</code>)</li>
64
-
<li><strong><code>u</code></strong> — Handle surrogate pairs as code points and enable <code>\u{…}</code> (<code>unicode</code>) — <em>Requires native ES6 support</em></li>
64
+
<li><strong><code>u</code></strong> — Handle surrogate pairs as code points and enable <code>\u{…}</code>and <code>\p{…}</code>(<code>unicode</code>) — <em>Requires native ES6 support</em></li>
65
65
<li><strong><code>y</code></strong> — Matches must start at <code>lastIndex</code> (<code>sticky</code>) — <em>Requires Firefox 3+ or native ES6 support</em></li>
66
+
<li><strong><code>d</code></strong> — Include indices for capturing groups on match results (<code>hasIndices</code>) — <em>Requires native ES2021 support</em></li>
<p>Specifies that the only valid captures are explicitly named groups of the form <code>(?<name>…)</code>. This allows unnamed <code>(…)</code> parentheses to act as noncapturing groups without the syntactic clumsiness of the expression <code>(?:…)</code>.</p>
74
+
<p>Specifies that the only captures are explicitly named groups of the form <code>(?<name>…)</code>. This allows unnamed <code>(…)</code> parentheses to act as noncapturing groups without the syntactic clumsiness of the expression <code>(?:…)</code>.</p>
74
75
75
76
<h3>Annotations</h3>
76
77
<ul>
77
78
<li><strong>Rationale:</strong> Backreference capturing adds performance overhead and is needed far less often than simple grouping. The <code>n</code> flag frees the <code>(…)</code> syntax from its often-undesired capturing side effect, while still allowing explicitly-named capturing groups.</li>
78
79
<li><strong>Compatibility:</strong> No known problems; the <code>n</code> flag is illegal in native JavaScript regular expressions.</li>
79
-
<li><strong>Prior art:</strong> The <code>n</code> flag comes from .NET.</li>
80
+
<li><strong>Prior art:</strong> The <code>n</code> flag comes from .NET, where it's called "explicit capture."</li>
80
81
</ul>
81
82
82
83
@@ -93,16 +94,16 @@ <h2 id="singleline">Dot matches all <span class="plain">(<code>s</code>)</span><
93
94
<p>Usually, a dot does not match newlines. However, a mode in which dots match any code unit (including newlines) can be as useful as one where dots don't. The <code>s</code> flag allows the mode to be selected on a per-regex basis. Escaped dots (<code>\.</code>) and dots within character classes (<code>[.]</code>) are always equivalent to literal dots. The newline code points are as follows:</p>
94
95
95
96
<ul>
96
-
<li><code>U+000a</code> — Line feed — <code>\n</code></li>
<li><strong>Rationale:</strong> All popular Perl-style regular expression flavors except JavaScript include a flag that allows dots to match newlines. Without this mode, matching any single code unit requires, e.g., <code>[\s\S]</code>, <code>[\0-\uFFFF]</code>, <code>[^]</code> (JavaScript only; doesn't work in some browsers without XRegExp), or god forbid <code>(.|\s)</code>.</li>
105
-
<li><strong>Compatibility:</strong> No known problems; the <code>s</code> flag is illegal in native JavaScript regular expressions.</li>
105
+
<li><strong>Rationale:</strong> All popular Perl-style regular expression flavors except JavaScript (prior to ES2018) include a flag that allows dots to match newlines. Without this mode, matching any single code unit requires, e.g., <code>[\s\S]</code>, <code>[\0-\uFFFF]</code>, <code>[^]</code> (JavaScript only; doesn't work in some browsers without XRegExp), or god forbid <code>(.|\s)</code> (which requires unnecessary backtracking).</li>
106
+
<li><strong>Compatibility:</strong> No known problems; the <code>s</code> flag is illegal in native JavaScript regular expressions prior to ES2018.</li>
106
107
<li><strong>Prior art:</strong> The <code>s</code> flag comes from Perl.</li>
107
108
</ul>
108
109
@@ -113,13 +114,13 @@ <h3>Annotations</h3>
113
114
114
115
<h2id="extended">Free-spacing and line comments <spanclass="plain">(<code>x</code>)</span></h2>
115
116
116
-
<p>This flag has two complementary effects. First, it causes most whitespace to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading <code>#</code>. Specifically, it turns most whitespace into an "ignore me" metacharacter, and <code>#</code> into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are <em>not</em> free-format, even with <code>x</code>), and as with other metacharacters, you can escape whitespace and <code>#</code> that you want to be taken literally. Of course, you can always use <code>\s</code> to match whitespace.</p>
117
+
<p>This flag has two complementary effects. First, it causes all whitespace recognized natively by <code>\s</code>to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading <code>#</code>. Specifically, it turns whitespace into an "ignore me" metacharacter, and <code>#</code> into an "ignore me and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are <em>not</em> free-format even with<code>x</code>, following precedent from most other regex libraries that support<code>x</code>), and as with other metacharacters, you can escape whitespace and <code>#</code> that you want to be taken literally. Of course, you can always use <code>\s</code> to match whitespace.</p>
117
118
118
119
<divclass="aside">
119
120
<p>It might be better to think of whitespace and comments as do-nothing (rather than ignore-me) metacharacters. This distinction is important with something like <code>\12 3</code>, which with the <code>x</code> flag is taken as <code>\12</code> followed by <code>3</code>, and not <code>\123</code>. However, quantifiers following whitespace or comments apply to the preceeding token, so <code>x +</code> is equivalent to <code>x+</code>.</p>
120
121
</div>
121
122
122
-
<p>The ignored whitespace characters are those matched natively by <code>\s</code>. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus <code>U+FEFF</code>. Following are the code points that should be matched by <code>\s</code> according to ES5 and Unicode 4.0.1–6.1.0 (not yet updated for later versions):</p>
123
+
<p>The ignored whitespace characters are those matched natively by <code>\s</code>. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus <code>U+FEFF</code>. Following are the code points that should be matched by <code>\s</code> according to ES5 and Unicode 4.0.1:</p>
<p>JavaScript's <code>\s</code> is similar but not equivalent to <code>\p{Z}</code> (the Separator category) from regex libraries that support Unicode categories, including XRegExp's own <ahref="../unicode/index.html">Unicode Categories addon</a>. The difference is that <code>\s</code> includes code points <code>U+0009</code>–<code>U+000D</code> and <code>U+FEFF</code>, which are not assigned the Separator category in the Unicode character database.</p>
170
171
171
-
<p>JavaScript's <code>\s</code> is nearly equivalent to <code>\p{White_Space}</code> from the <ahref="../unicode/index.html">Unicode Properties addon</a>. The differences are: 1. <code>\p{White_Space}</code> does not include <code>U+FEFF</code> (ZWNBSP). 2. <code>\p{White_Space}</code> includes <code>U+0085</code> (NEL), which is not assigned the Separator category in the Unicode character database.</p>
172
+
<p>JavaScript's <code>\s</code> is nearly equivalent to <code>\p{White_Space}</code> from the <ahref="../unicode/index.html">Unicode Properties addon</a>. The differences are: 1. <code>\p{White_Space}</code> does not include <code>U+FEFF</code> (ZWNBSP), and 2. <code>\p{White_Space}</code> includes <code>U+0085</code> (NEL), which is not assigned the Separator category in the Unicode character database.</p>
172
173
173
-
<p>Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, <code>\s</code>, <code>\S</code>, <code>.</code>, <code>^</code>, and <code>$</code> use Unicode-based interpretations of <em>whitespace</em> and <em>newline</em>, while <code>\d</code>, <code>\D</code>, <code>\w</code>, <code>\W</code>, <code>\b</code>, and <code>\B</code> use ASCII-only interpretations of <em>digit</em>, <em>word character</em>, and <em>word boundary</em><!-- (e.g., <code>/a\b/.test("naïve")</code> returns <code>true</code>)-->. Many browsers get some of these details wrong.<!--E.g., Firefox 2 and earlier considers <code>\d</code> and <code>\D</code> to be Unicode-aware. Firefox 3 fixes this bug, making <code>\d</code> equivalent to <code>[0-9]</code>.--></p>
174
+
<p>Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, <code>\s</code>, <code>\S</code>, <code>.</code>, <code>^</code>, and <code>$</code> use Unicode-based interpretations of <em>whitespace</em> and <em>newline</em>, while <code>\d</code>, <code>\D</code>, <code>\w</code>, <code>\W</code>, <code>\b</code>, and <code>\B</code> use ASCII-only interpretations of <em>digit</em>, <em>word character</em>, and <em>word boundary</em><!-- (e.g., <code>/a\b/.test("naïve")</code> returns <code>true</code>)-->. Some browsers and browser versions get aspects of these details wrong.</p>
174
175
175
176
<p>For more details, see <ahref="https://blog.stevenlevithan.com/archives/javascript-regex-and-unicode"><em>JavaScript, Regex, and Unicode</em></a>.</p>
<p><strong>Requires the <ahref="../unicode/index.html">Unicode Base</a> addon.</strong></p>
182
183
183
184
<p>By default, <code>\p{…}</code> and <code>\P{…}</code> support the Basic Multilingual Plane (i.e. code points up to <code>U+FFFF</code>). You can opt-in to full 21-bit Unicode support (with code points up to <code>U+10FFFF</code>) on a per-regex basis by using flag <code>A</code>. In XRegExp, this is called <em>astral mode</em>. You can automatically add flag <code>A</code> for all new regexes by running <code>XRegExp.install('astral')</code>. When in astral mode, <code>\p{…}</code> and <code>\P{…}</code> always match a full code point rather than a code unit, using surrogate pairs for code points above <code>U+FFFF</code>.</p>
184
185
185
186
<preclass="sh_javascript">// Using flag A to match astral code points
186
-
XRegExp('^\\pS$').test('💩'); // -> false
187
-
XRegExp('^\\pS$', 'A').test('💩'); // -> true
188
-
XRegExp('(?A)^\\pS$').test('💩'); // -> true
187
+
XRegExp('^\\p{S}$').test('💩'); // -> false
188
+
XRegExp('^\\p{S}$', 'A').test('💩'); // -> true
189
+
XRegExp('(?A)^\\p{S}$').test('💩'); // -> true
189
190
// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
<p>Opting in to astral mode disables the use of <code>\p{…}</code> and <code>\P{…}</code> within character classes. In astral mode, use e.g. <code>(\pL|[0-9_])+</code> instead of <code>[\pL0-9_]+</code>.</p>
198
+
<p><strong>Important:</strong>Opting in to astral mode disables the use of <code>\p{…}</code> and <code>\P{…}</code> within character classes. In astral mode, use e.g. <code>(\p{L}|[0-9_])+</code> instead of <code>[\p{L}0-9_]+</code>.</p>
Copy file name to clipboardExpand all lines: docs/syntax/index.html
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -135,7 +135,7 @@ <h3>Annotations</h3>
135
135
136
136
<h2id="modeModifier">Leading mode modifier</h2>
137
137
138
-
<p>A mode modifier uses the syntax <code>(?<em>imnsuxA</em>)</code>, where <code><em>imnsuxA</em></code> is any combination of XRegExp flags except <code>g</code>or <code>y</code>. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.</p>
138
+
<p>A mode modifier uses the syntax <code>(?<em>imnsuxA</em>)</code>, where <code><em>imnsuxA</em></code> is any combination of XRegExp flags except <code>g</code>, <code>y</code>, or <code>d</code>. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.</p>
<p>When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate <code>flags</code> argument. For instance, <code>XRegExp('(?s).+', 's')</code> is valid.</p>
147
147
148
-
<p>Flags <code>g</code>and <code>y</code> cannot be included in a mode modifier, or an error is thrown. This is because <code>g</code>and <code>y</code>, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. In fact, XRegExp methods provide e.g. <code>scope</code>, <code>sticky</code>, and <code>pos</code> arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Also consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags <code>g</code>and <code>y</code> only make sense when applied to the regex as a whole. Allowing <code>g</code>and <code>y</code> in a mode modifier might therefore create future compatibility problems.</p>
148
+
<p>Flags <code>g</code>, <code>y</code>, and <code>d</code> cannot be included in a mode modifier, or an error is thrown. This is because <code>g</code>, <code>y</code>, and <code>d</code>, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. XRegExp methods provide e.g. <code>scope</code>, <code>sticky</code>, and <code>pos</code> arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Additionally, consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags <code>g</code>, <code>y</code>, and <code>d</code> only make sense when applied to the regex as a whole. Allowing <code>g</code>, <code>y</code>, and <code>d</code> in a mode modifier might therefore create future compatibility problems.</p>
149
149
150
150
<p>The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.</p>
0 commit comments