Skip to content

Commit 93e4b50

Browse files
authored
maint: introduce LintMan to aid on tracking/updating values and fix pcre2limits.3 for the correct name length (#776)
Allow tagging the documentation with a `#define` value that could be then updated programmatically. Update the value for MAX_NAME_SIZE in pcre2limits.3 that was missing since ced3b0f (Increase name length to 128, 2024-03-11) and while at it, improve on its description and add a tag for a related variable. For completeness, add also a tag to the same value in pcre2pattern.3 and the configuration for VMS that was missing since 6c670c7 (Update overlooked cmake update of name size to 128, 2024-03-11) and add it to UpdateAlways so it can be used in a developer tree.
1 parent 341f424 commit 93e4b50

File tree

10 files changed

+106
-17
lines changed

10 files changed

+106
-17
lines changed

doc/html/pcre2limits.html

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -64,8 +64,11 @@ <h2>
6464
a compile context.
6565
</p>
6666
<p>
67-
The maximum length of name for a named capture group is 32 code units, and the
68-
maximum number of such groups is 10000.
67+
The maximum length of the name for a named capture group as well as the number
68+
of such groups is configurable at build time. The maximum length for the name
69+
defaults to
70+
128 code units, and the maximum number of such groups to
71+
10000.
6972
</p>
7073
<p>
7174
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
@@ -96,7 +99,7 @@ <h2>
9699
REVISION
97100
</h2>
98101
<p>
99-
Last updated: 16 August 2023
102+
Last updated: 17 August 2025
100103
<br>
101104
Copyright &copy; 1997-2023 University of Cambridge.
102105
<br>

doc/html/pcre2pattern.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2007,8 +2007,8 @@ <h2><a name="SEC18" href="#TOC1">NAMED CAPTURE GROUPS</a></h2>
20072007
</p>
20082008
<p>
20092009
In PCRE2, a capture group can be named in one of three ways: (?&#60;name&#62;...) or
2010-
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to 128
2011-
code units long. When PCRE2_UTF is not set, they may contain only ASCII
2010+
(?'name'...) as in Perl, or (?P&#60;name&#62;...) as in Python. Names may be up to
2011+
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
20122012
alphanumeric characters and underscores, but must start with a non-digit. When
20132013
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
20142014
letter or Unicode decimal digit. In other words, group names must match one of
@@ -4183,7 +4183,7 @@ <h2><a name="SEC33" href="#TOC1">AUTHOR</a></h2>
41834183
</p>
41844184
<h2><a name="SEC34" href="#TOC1">REVISION</a></h2>
41854185
<p>
4186-
Last updated: 28 March 2025
4186+
Last updated: 17 August 2025
41874187
<br>
41884188
Copyright &copy; 1997-2024 University of Cambridge.
41894189
<br>

doc/pcre2.txt

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6246,8 +6246,10 @@ SIZE AND OTHER LIMITATIONS
62466246
is set to 250. An application can change this limit by calling
62476247
pcre2_set_parens_nest_limit() to set the limit in a compile context.
62486248

6249-
The maximum length of name for a named capture group is 32 code units,
6250-
and the maximum number of such groups is 10000.
6249+
The maximum length of the name for a named capture group as well as the
6250+
number of such groups is configurable at build time. The maximum length
6251+
for the name defaults to 128 code units, and the maximum number of such
6252+
groups to 10000.
62516253

62526254
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or
62536255
(*THEN) verb is 255 code units for the 8-bit library and 65535 code
@@ -6270,7 +6272,7 @@ AUTHOR
62706272

62716273
REVISION
62726274

6273-
Last updated: 16 August 2023
6275+
Last updated: 17 August 2025
62746276
Copyright (c) 1997-2023 University of Cambridge.
62756277

62766278

@@ -10755,7 +10757,7 @@ AUTHOR
1075510757

1075610758
REVISION
1075710759

10758-
Last updated: 28 March 2025
10760+
Last updated: 17 August 2025
1075910761
Copyright (c) 1997-2024 University of Cambridge.
1076010762

1076110763

doc/pcre2limits.3

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,13 @@ when PCRE2 is built; if not, the default is set to 250. An application can
4747
change this limit by calling pcre2_set_parens_nest_limit() to set the limit in
4848
a compile context.
4949
.P
50-
The maximum length of name for a named capture group is 32 code units, and the
51-
maximum number of such groups is 10000.
50+
The maximum length of the name for a named capture group as well as the number
51+
of such groups is configurable at build time. The maximum length for the name
52+
defaults to
53+
.\" DEFINE MAX_NAME_SIZE
54+
128 code units, and the maximum number of such groups to
55+
.\" DEFINE MAX_NAME_COUNT
56+
10000.
5257
.P
5358
The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
5459
is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
@@ -76,6 +81,6 @@ Cambridge, England.
7681
.rs
7782
.sp
7883
.nf
79-
Last updated: 16 August 2023
84+
Last updated: 17 August 2025
8085
Copyright (c) 1997-2023 University of Cambridge.
8186
.fi

doc/pcre2pattern.3

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2015,8 +2015,9 @@ the naming of capture groups. This feature was not added to Perl until release
20152015
using the Python syntax. PCRE2 supports both the Perl and the Python syntax.
20162016
.P
20172017
In PCRE2, a capture group can be named in one of three ways: (?<name>...) or
2018-
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to 128
2019-
code units long. When PCRE2_UTF is not set, they may contain only ASCII
2018+
(?'name'...) as in Perl, or (?P<name>...) as in Python. Names may be up to
2019+
.\" DEFINE MAX_NAME_SIZE
2020+
128 code units long. When PCRE2_UTF is not set, they may contain only ASCII
20202021
alphanumeric characters and underscores, but must start with a non-digit. When
20212022
PCRE2_UTF is set, the syntax of group names is extended to allow any Unicode
20222023
letter or Unicode decimal digit. In other words, group names must match one of
@@ -4229,6 +4230,6 @@ Cambridge, England.
42294230
.rs
42304231
.sp
42314232
.nf
4232-
Last updated: 28 March 2025
4233+
Last updated: 17 August 2025
42334234
Copyright (c) 1997-2024 University of Cambridge.
42344235
.fi

maint/CheckMan

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ while (scalar(@ARGV) > 0)
3939
^\.P\s*$|
4040
^\.PP\s*$|
4141
^\.\\"(?:\ HREF)?\s*$|
42+
^\.\\"\sDEFINE\s\w+$|
4243
^\.\\"\sHTML\s<a\shref="[^"]+?">\s*$|
4344
^\.\\"\sHTML\s<a\sname="[^"]+?"><\/a>\s*$|
4445
^\.\\"\s<\/a>\s*$|

maint/LintMan

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
#!/usr/bin/perl
2+
3+
use warnings;
4+
use strict;
5+
use Getopt::Long;
6+
use vars qw /$opt_verbose/;
7+
8+
# A script to scan PCRE2's man pages to check for values that might need to
9+
# be updated to match the code.
10+
#
11+
# It updates numerical values after \" DEFINE <name> or errors if name is
12+
# not found.
13+
14+
my $file;
15+
my %defs;
16+
17+
foreach $file ("../src/config.h.generic")
18+
{
19+
open (INCLUDE, $file) or die "Failed to open include $file\n";
20+
21+
while (<INCLUDE>)
22+
{
23+
next unless /^#define ([[:upper:]_\d]+)\s+(\d+)/a;
24+
$defs{$1} = $2;
25+
}
26+
27+
close(INCLUDE);
28+
}
29+
30+
GetOptions("verbose");
31+
while (scalar(@ARGV) > 0)
32+
{
33+
$file = shift @ARGV;
34+
35+
open my $fh, "+<", $file or die "Failed to open $file\n";
36+
37+
my @lines = <$fh>;
38+
my $updated = 0;
39+
40+
foreach my $index (0 .. $#lines)
41+
{
42+
if ($lines[$index] =~ /^\.\\"\sDEFINE\s([[:upper:]_\d]+)$/a)
43+
{
44+
my $l = $index + 1;
45+
die "Invalid DEFINE line $l of $file\n" unless defined $lines[$l];
46+
47+
my $key = $1;
48+
die "Bad DEFINE key $key line $l of $file\n" unless exists $defs{$key};
49+
50+
my $value = $defs{$key};
51+
if ($lines[$index + 1] !~ /^$value\b/)
52+
{
53+
$updated += $lines[$index + 1] =~ s/^\d+/$value/a;
54+
print "Updated $key in $file to $value\n" if $opt_verbose;
55+
}
56+
}
57+
}
58+
59+
if ($updated > 0)
60+
{
61+
seek($fh, 0, 0);
62+
print $fh @lines;
63+
truncate($fh, tell($fh));
64+
}
65+
close($fh);
66+
}

maint/README

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,10 @@ GenerateUcpTables.py
6060
GenerateCommon.py and Unicode data files. The generated file contains tables
6161
for looking up Unicode property names.
6262

63+
LintMan
64+
A Perl script to check and update magic numbers in the documentation that
65+
correspond to configurable settings in the codebase.
66+
6367
manifest-*
6468
Data files used to verify the contents of the distribution tarball and
6569
`make install` file lists.

maint/UpdateAlways

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@
1919

2020
# Detrail A Perl script that removes trailing spaces from files.
2121

22+
# LintMan A Perl script that lints man pages looking for inconsistencies.
23+
2224
# doc/index.html.src
2325
# A file that is copied as index.html into the doc/html directory
2426
# when the HTML documentation is built. It works like this so that
@@ -54,6 +56,11 @@ echo Processing documentation
5456
perl ../maint/CheckMan *.1 *.3
5557
if [ $? != 0 ] ; then exit 1; fi
5658

59+
if [ -f ../src/config.h.generic ] ; then
60+
perl ../maint/LintMan -v *.3
61+
if [ $? != 0 ] ; then exit 1; fi
62+
fi
63+
5764
# Verify the version number in the man pages
5865

5966
for file in *.1 *.3 ; do

vms/configure.com

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -905,7 +905,7 @@ sure both macros are undefined; an emulation function will then be used. */
905905
#define PCRE2_EXPORT
906906
#define LINK_SIZE 2
907907
#define MAX_NAME_COUNT 10000
908-
#define MAX_NAME_SIZE 32
908+
#define MAX_NAME_SIZE 128
909909
#define MATCH_LIMIT 10000000
910910
#define HEAP_LIMIT 20000000
911911
#define NEWLINE_DEFAULT 2

0 commit comments

Comments
 (0)