Skip to content

Commit b8f8d5c

Browse files
committed
Add native /d flag to docs, and other improvements to flag documentation
1 parent 220c92a commit b8f8d5c

File tree

4 files changed

+35
-34
lines changed

4 files changed

+35
-34
lines changed

docs/flags/index.html

Lines changed: 26 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,10 @@ <h1 class="subtitle">The one of a kind JavaScript regular expression library</h1
3333
<h2>Table of contents</h2>
3434
<ul>
3535
<li><a href="#about">About flags</a></li>
36-
<li><a href="#explicitCapture">Explicit capture (n)</a></li>
36+
<li><a href="#explicitCapture">Named capture only (n)</a></li>
3737
<li><a href="#singleline">Dot matches all (s)</a></li>
3838
<li><a href="#extended">Free-spacing and line comments (x)</a></li>
39-
<li><a href="#astral">Astral (A)</a></li>
39+
<li><a href="#astral">21-bit Unicode properties (A)</a></li>
4040
</ul>
4141
</div>
4242
</div>
@@ -50,33 +50,34 @@ <h2 id="about">About flags</h2>
5050
<ul>
5151
<li><strong>New flags</strong>
5252
<ul>
53-
<li><strong><code>n</code></strong> &mdash; Explicit capture</li>
54-
<li><strong><code>s</code></strong> &mdash; Dot matches all (aka <em>singleline</em> mode) &mdash; <em>Added as a native flag in ES2018</em></li>
55-
<li><strong><code>x</code></strong> &mdash; Free-spacing and line comments (aka <em>extended</em> mode)</li>
56-
<li><strong><code>A</code></strong> &mdash; Astral (requires the Unicode Base addon)</li>
53+
<li><strong><code>n</code></strong> &mdash; Named capture only</li>
54+
<li><strong><code>s</code></strong> &mdash; Dot matches all (<em>singleline</em>) &mdash; <em>Added as a native flag in ES2018, but XRegExp always supports it</em></li>
55+
<li><strong><code>x</code></strong> &mdash; Free-spacing and line comments (<em>extended</em>)</li>
56+
<li><strong><code>A</code></strong> &mdash; 21-bit Unicode properties (<em>astral</em>) &mdash; <em>Requires the Unicode Base addon</em></li>
5757
</ul>
5858
</li>
5959
<li><strong>Native flags</strong>
6060
<ul>
6161
<li><strong><code>g</code></strong> &mdash; All matches, or advance <code>lastIndex</code> after matches (<code>global</code>)</li>
6262
<li><strong><code>i</code></strong> &mdash; Case insensitive (<code>ignoreCase</code>)</li>
6363
<li><strong><code>m</code></strong> &mdash; <code>^</code> and <code>$</code> match at newlines (<code>multiline</code>)</li>
64-
<li><strong><code>u</code></strong> &mdash; Handle surrogate pairs as code points and enable <code>\u{&hellip;}</code> (<code>unicode</code>) &mdash; <em>Requires native ES6 support</em></li>
64+
<li><strong><code>u</code></strong> &mdash; Handle surrogate pairs as code points and enable <code>\u{&hellip;}</code> and <code>\p{&hellip;}</code> (<code>unicode</code>) &mdash; <em>Requires native ES6 support</em></li>
6565
<li><strong><code>y</code></strong> &mdash; Matches must start at <code>lastIndex</code> (<code>sticky</code>) &mdash; <em>Requires Firefox 3+ or native ES6 support</em></li>
66+
<li><strong><code>d</code></strong> &mdash; Include indices for capturing groups on match results (<code>hasIndices</code>) &mdash; <em>Requires native ES2021 support</em></li>
6667
</ul>
6768
</li>
6869
</ul>
6970

7071

71-
<h2 id="explicitCapture">Explicit capture <span class="plain">(<code>n</code>)</span></h2>
72+
<h2 id="explicitCapture">Named capture only <span class="plain">(<code>n</code>)</span></h2>
7273

73-
<p>Specifies that the only valid captures are explicitly named groups of the form <code>(?&lt;name>&hellip;)</code>. This allows unnamed <code>(&hellip;)</code> parentheses to act as noncapturing groups without the syntactic clumsiness of the expression <code>(?:&hellip;)</code>.</p>
74+
<p>Specifies that the only captures are explicitly named groups of the form <code>(?&lt;name>&hellip;)</code>. This allows unnamed <code>(&hellip;)</code> parentheses to act as noncapturing groups without the syntactic clumsiness of the expression <code>(?:&hellip;)</code>.</p>
7475

7576
<h3>Annotations</h3>
7677
<ul>
7778
<li><strong>Rationale:</strong> Backreference capturing adds performance overhead and is needed far less often than simple grouping. The <code>n</code> flag frees the <code>(&hellip;)</code> syntax from its often-undesired capturing side effect, while still allowing explicitly-named capturing groups.</li>
7879
<li><strong>Compatibility:</strong> No known problems; the <code>n</code> flag is illegal in native JavaScript regular expressions.</li>
79-
<li><strong>Prior art:</strong> The <code>n</code> flag comes from .NET.</li>
80+
<li><strong>Prior art:</strong> The <code>n</code> flag comes from .NET, where it's called "explicit capture."</li>
8081
</ul>
8182

8283

@@ -93,16 +94,16 @@ <h2 id="singleline">Dot matches all <span class="plain">(<code>s</code>)</span><
9394
<p>Usually, a dot does not match newlines. However, a mode in which dots match any code unit (including newlines) can be as useful as one where dots don't. The <code>s</code> flag allows the mode to be selected on a per-regex basis. Escaped dots (<code>\.</code>) and dots within character classes (<code>[.]</code>) are always equivalent to literal dots. The newline code points are as follows:</p>
9495

9596
<ul>
96-
<li><code>U+000a</code> &mdash; Line feed &mdash; <code>\n</code></li>
97-
<li><code>U+000d</code> &mdash; Carriage return &mdash; <code>\r</code></li>
97+
<li><code>U+000A</code> &mdash; Line feed &mdash; <code>\n</code></li>
98+
<li><code>U+000D</code> &mdash; Carriage return &mdash; <code>\r</code></li>
9899
<li><code>U+2028</code> &mdash; Line separator</li>
99100
<li><code>U+2029</code> &mdash; Paragraph separator</li>
100101
</ul>
101102

102103
<h3>Annotations</h3>
103104
<ul>
104-
<li><strong>Rationale:</strong> All popular Perl-style regular expression flavors except JavaScript include a flag that allows dots to match newlines. Without this mode, matching any single code unit requires, e.g., <code>[\s\S]</code>, <code>[\0-\uFFFF]</code>, <code>[^]</code> (JavaScript only; doesn't work in some browsers without XRegExp), or god forbid <code>(.|\s)</code>.</li>
105-
<li><strong>Compatibility:</strong> No known problems; the <code>s</code> flag is illegal in native JavaScript regular expressions.</li>
105+
<li><strong>Rationale:</strong> All popular Perl-style regular expression flavors except JavaScript (prior to ES2018) include a flag that allows dots to match newlines. Without this mode, matching any single code unit requires, e.g., <code>[\s\S]</code>, <code>[\0-\uFFFF]</code>, <code>[^]</code> (JavaScript only; doesn't work in some browsers without XRegExp), or god forbid <code>(.|\s)</code> (which requires unnecessary backtracking).</li>
106+
<li><strong>Compatibility:</strong> No known problems; the <code>s</code> flag is illegal in native JavaScript regular expressions prior to ES2018.</li>
106107
<li><strong>Prior art:</strong> The <code>s</code> flag comes from Perl.</li>
107108
</ul>
108109

@@ -113,13 +114,13 @@ <h3>Annotations</h3>
113114

114115
<h2 id="extended">Free-spacing and line comments <span class="plain">(<code>x</code>)</span></h2>
115116

116-
<p>This flag has two complementary effects. First, it causes most whitespace to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading <code>#</code>. Specifically, it turns most whitespace into an "ignore me" metacharacter, and <code>#</code> into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are <em>not</em> free-format, even with <code>x</code>), and as with other metacharacters, you can escape whitespace and <code>#</code> that you want to be taken literally. Of course, you can always use <code>\s</code> to match whitespace.</p>
117+
<p>This flag has two complementary effects. First, it causes all whitespace recognized natively by <code>\s</code> to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading <code>#</code>. Specifically, it turns whitespace into an "ignore me" metacharacter, and <code>#</code> into an "ignore me and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are <em>not</em> free-format even with <code>x</code>, following precedent from most other regex libraries that support <code>x</code>), and as with other metacharacters, you can escape whitespace and <code>#</code> that you want to be taken literally. Of course, you can always use <code>\s</code> to match whitespace.</p>
117118

118119
<div class="aside">
119120
<p>It might be better to think of whitespace and comments as do-nothing (rather than ignore-me) metacharacters. This distinction is important with something like <code>\12&nbsp;3</code>, which with the <code>x</code> flag is taken as <code>\12</code> followed by <code>3</code>, and not <code>\123</code>. However, quantifiers following whitespace or comments apply to the preceeding token, so <code>x&nbsp;+</code> is equivalent to <code>x+</code>.</p>
120121
</div>
121122

122-
<p>The ignored whitespace characters are those matched natively by <code>\s</code>. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus <code>U+FEFF</code>. Following are the code points that should be matched by <code>\s</code> according to ES5 and Unicode 4.0.1&ndash;6.1.0 (not yet updated for later versions):</p>
123+
<p>The ignored whitespace characters are those matched natively by <code>\s</code>. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus <code>U+FEFF</code>. Following are the code points that should be matched by <code>\s</code> according to ES5 and Unicode 4.0.1:</p>
123124

124125
<ul style="-webkit-column-count:3; -moz-column-count:3; column-count:3;">
125126
<li><code>U+0009</code> &mdash; Tab &mdash; <code>\t</code></li>
@@ -168,33 +169,33 @@ <h3>Annotations</h3>
168169
<div class="aside">
169170
<p>JavaScript's <code>\s</code> is similar but not equivalent to <code>\p{Z}</code> (the Separator category) from regex libraries that support Unicode categories, including XRegExp's own <a href="../unicode/index.html">Unicode Categories addon</a>. The difference is that <code>\s</code> includes code points <code>U+0009</code>&ndash;<code>U+000D</code> and <code>U+FEFF</code>, which are not assigned the Separator category in the Unicode character database.</p>
170171

171-
<p>JavaScript's <code>\s</code> is nearly equivalent to <code>\p{White_Space}</code> from the <a href="../unicode/index.html">Unicode Properties addon</a>. The differences are: 1. <code>\p{White_Space}</code> does not include <code>U+FEFF</code> (ZWNBSP). 2. <code>\p{White_Space}</code> includes <code>U+0085</code> (NEL), which is not assigned the Separator category in the Unicode character database.</p>
172+
<p>JavaScript's <code>\s</code> is nearly equivalent to <code>\p{White_Space}</code> from the <a href="../unicode/index.html">Unicode Properties addon</a>. The differences are: 1. <code>\p{White_Space}</code> does not include <code>U+FEFF</code> (ZWNBSP), and 2. <code>\p{White_Space}</code> includes <code>U+0085</code> (NEL), which is not assigned the Separator category in the Unicode character database.</p>
172173

173-
<p>Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, <code>\s</code>, <code>\S</code>, <code>.</code>, <code>^</code>, and <code>$</code> use Unicode-based interpretations of <em>whitespace</em> and <em>newline</em>, while <code>\d</code>, <code>\D</code>, <code>\w</code>, <code>\W</code>, <code>\b</code>, and <code>\B</code> use ASCII-only interpretations of <em>digit</em>, <em>word character</em>, and <em>word boundary</em><!-- (e.g., <code>/a\b/.test("na&iuml;ve")</code> returns <code>true</code>)-->. Many browsers get some of these details wrong. <!--E.g., Firefox 2 and earlier considers <code>\d</code> and <code>\D</code> to be Unicode-aware. Firefox 3 fixes this bug, making <code>\d</code> equivalent to <code>[0-9]</code>.--></p>
174+
<p>Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, <code>\s</code>, <code>\S</code>, <code>.</code>, <code>^</code>, and <code>$</code> use Unicode-based interpretations of <em>whitespace</em> and <em>newline</em>, while <code>\d</code>, <code>\D</code>, <code>\w</code>, <code>\W</code>, <code>\b</code>, and <code>\B</code> use ASCII-only interpretations of <em>digit</em>, <em>word character</em>, and <em>word boundary</em><!-- (e.g., <code>/a\b/.test("na&iuml;ve")</code> returns <code>true</code>)-->. Some browsers and browser versions get aspects of these details wrong.</p>
174175

175176
<p>For more details, see <a href="https://blog.stevenlevithan.com/archives/javascript-regex-and-unicode"><em>JavaScript, Regex, and Unicode</em></a>.</p>
176177
</div>
177178

178179

179-
<h2 id="astral">Astral <span class="plain">(<code>A</code>)</span></h2>
180+
<h2 id="astral">21-bit Unicode properties <span class="plain">(<code>A</code>)</span></h2>
180181

181182
<p><strong>Requires the <a href="../unicode/index.html">Unicode Base</a> addon.</strong></p>
182183

183184
<p>By default, <code>\p{&hellip;}</code> and <code>\P{&hellip;}</code> support the Basic Multilingual Plane (i.e. code points up to <code>U+FFFF</code>). You can opt-in to full 21-bit Unicode support (with code points up to <code>U+10FFFF</code>) on a per-regex basis by using flag <code>A</code>. In XRegExp, this is called <em>astral mode</em>. You can automatically add flag <code>A</code> for all new regexes by running <code>XRegExp.install('astral')</code>. When in astral mode, <code>\p{&hellip;}</code> and <code>\P{&hellip;}</code> always match a full code point rather than a code unit, using surrogate pairs for code points above <code>U+FFFF</code>.</p>
184185

185186
<pre class="sh_javascript">// Using flag A to match astral code points
186-
XRegExp('^\\pS$').test('💩'); // -> false
187-
XRegExp('^\\pS$', 'A').test('💩'); // -> true
188-
XRegExp('(?A)^\\pS$').test('💩'); // -> true
187+
XRegExp('^\\p{S}$').test('💩'); // -> false
188+
XRegExp('^\\p{S}$', 'A').test('💩'); // -> true
189+
XRegExp('(?A)^\\p{S}$').test('💩'); // -> true
189190
// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
190-
XRegExp('(?A)^\\pS$').test('\uD83D\uDCA9'); // -> true
191+
XRegExp('(?A)^\\p{S}$').test('\uD83D\uDCA9'); // -> true
191192

192193
// Implicit flag A
193194
XRegExp.install('astral');
194-
XRegExp('^\\pS$').test('💩'); // -> true
195+
XRegExp('^\\p{S}$').test('💩'); // -> true
195196
</pre>
196197

197-
<p>Opting in to astral mode disables the use of <code>\p{&hellip;}</code> and <code>\P{&hellip;}</code> within character classes. In astral mode, use e.g. <code>(\pL|[0-9_])+</code> instead of <code>[\pL0-9_]+</code>.</p>
198+
<p><strong>Important:</strong> Opting in to astral mode disables the use of <code>\p{&hellip;}</code> and <code>\P{&hellip;}</code> within character classes. In astral mode, use e.g. <code>(\p{L}|[0-9_])+</code> instead of <code>[\p{L}0-9_]+</code>.</p>
198199

199200
<h3>Annotations</h3>
200201
<ul>

docs/syntax/index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ <h3>Annotations</h3>
135135

136136
<h2 id="modeModifier">Leading mode modifier</h2>
137137

138-
<p>A mode modifier uses the syntax <code>(?<em>imnsuxA</em>)</code>, where <code><em>imnsuxA</em></code> is any combination of XRegExp flags except <code>g</code> or <code>y</code>. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.</p>
138+
<p>A mode modifier uses the syntax <code>(?<em>imnsuxA</em>)</code>, where <code><em>imnsuxA</em></code> is any combination of XRegExp flags except <code>g</code>, <code>y</code>, or <code>d</code>. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.</p>
139139

140140
<h3 style="margin-top:20px;">Example</h3>
141141
<pre class="sh_javascript">const regex = XRegExp('(?im)^[a-z]+$');
@@ -145,7 +145,7 @@ <h3 style="margin-top:20px;">Example</h3>
145145

146146
<p>When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate <code>flags</code> argument. For instance, <code>XRegExp('(?s).+', 's')</code> is valid.</p>
147147

148-
<p>Flags <code>g</code> and <code>y</code> cannot be included in a mode modifier, or an error is thrown. This is because <code>g</code> and <code>y</code>, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. In fact, XRegExp methods provide e.g. <code>scope</code>, <code>sticky</code>, and <code>pos</code> arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Also consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags <code>g</code> and <code>y</code> only make sense when applied to the regex as a whole. Allowing <code>g</code> and <code>y</code> in a mode modifier might therefore create future compatibility problems.</p>
148+
<p>Flags <code>g</code>, <code>y</code>, and <code>d</code> cannot be included in a mode modifier, or an error is thrown. This is because <code>g</code>, <code>y</code>, and <code>d</code>, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. XRegExp methods provide e.g. <code>scope</code>, <code>sticky</code>, and <code>pos</code> arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Additionally, consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags <code>g</code>, <code>y</code>, and <code>d</code> only make sense when applied to the regex as a whole. Allowing <code>g</code>, <code>y</code>, and <code>d</code> in a mode modifier might therefore create future compatibility problems.</p>
149149

150150
<p>The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.</p>
151151

src/xregexp.js

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -522,17 +522,17 @@ function setNamespacing(on) {
522522
* @param {String|RegExp} pattern Regex pattern string, or an existing regex object to copy.
523523
* @param {String} [flags] Any combination of flags.
524524
* Native flags:
525-
* - `d` - indices for groups (ES2021)
525+
* - `d` - indices for capturing groups (ES2021)
526526
* - `g` - global
527527
* - `i` - ignore case
528528
* - `m` - multiline anchors
529529
* - `u` - unicode (ES6)
530530
* - `y` - sticky (Firefox 3+, ES6)
531531
* Additional XRegExp flags:
532-
* - `n` - explicit capture
532+
* - `n` - named capture only
533533
* - `s` - dot matches all (aka singleline) - works even when not natively supported
534534
* - `x` - free-spacing and line comments (aka extended)
535-
* - `A` - astral (requires the Unicode Base addon)
535+
* - `A` - 21-bit Unicode properties (aka astral) - requires the Unicode Base addon
536536
* Flags cannot be provided when constructing one `RegExp` from another.
537537
* @returns {RegExp} Extended regular expression object.
538538
* @example
@@ -1885,7 +1885,7 @@ XRegExp.addToken(
18851885

18861886
/*
18871887
* Capturing group; match the opening parenthesis only. Required for support of named capturing
1888-
* groups. Also adds explicit capture mode (flag n).
1888+
* groups. Also adds named capture only mode (flag n).
18891889
*/
18901890
XRegExp.addToken(
18911891
/\((?!\?)/,

types/index.d.ts

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,18 +16,18 @@ export = XRegExp;
1616
* @param flags - Any combination of flags.
1717
*
1818
* Native flags:
19-
* - `d` - indices for groups (ES2021)
19+
* - `d` - indices for capturing groups (ES2021)
2020
* - `g` - global
2121
* - `i` - ignore case
2222
* - `m` - multiline anchors
2323
* - `u` - unicode (ES6)
2424
* - `y` - sticky (Firefox 3+, ES6)
2525
*
2626
* Additional XRegExp flags:
27-
* - `n` - explicit capture
27+
* - `n` - named capture only
2828
* - `s` - dot matches all (aka singleline) - works even when not natively supported
2929
* - `x` - free-spacing and line comments (aka extended)
30-
* - `A` - astral (requires the Unicode Base addon)
30+
* - `A` - 21-bit Unicode properties (aka astral) - requires the Unicode Base addon
3131
*
3232
* **Flags cannot be provided when constructing one `RegExp` from another.**
3333
* @returns Extended regular expression object.

0 commit comments

Comments
 (0)