Skip to content

Conversation

@wvell
Copy link

@wvell wvell commented Aug 1, 2025

We encountered weird behaviour on our mac development stations where the space-collapse-fix would add a space between every character.

The reason seems to be that the /x modifier strips the U::NO_BREAK_SPACE. This only seems to happen on mac when using setlocale before running this function. On ubuntu with all the versions the same(php/pcre) it works normally.

I have attached a small test script that shows the error on my system.

There are multiple issues touching this issue:

PHP version: 8.4.8
PCRE library: 10.45 2025-02-05

<?php

$nbsp  = "\xC2\xA0";
$nbspEscaped  = "\\x{00A0}";
$input = "a{$nbsp}b";

// The result should become: a-b
// First without setlocale (the default locale is C.UTF-8 on my mac).
echo preg_replace("/{$nbsp}/xu", '-', $input) . "\n"; // This works and outputs a-b

// Set the locale to the same that i currently have.
setlocale(LC_CTYPE, "C.UTF-8");
echo preg_replace("/{$nbsp}/xu", '-', $input) . "\n"; // Wrong output -a-b-
echo preg_replace("/{$nbspEscaped}/xu", '-', $input) . "\n"; // Correct output a-b

// It looks like any locale i set triggers the error.
setlocale(LC_CTYPE, "en_US.UTF-8");
echo preg_replace("/{$nbsp}/xu", '-', $input) . "\n"; // Wrong output -a-b-
echo preg_replace("/{$nbspEscaped}/xu", '-', $input) . "\n"; // Correct output a-b

…rip it out

Previously it injected a raw NBSP byte (`"\xC2\xA0"`) into the patterns, but
under PCRE2’s extended mode (`/x`) and certain locales (notably on macOS after
a `setlocale()` call), that literal byte gets treated as ignorable "whitespace"
in the pattern and is dropped at compile time. This causes regexes to collapse
into zero-length matches.

By switching to the PCRE escape `\x{00A0}` it ensures the NBSP code point
remains in the pattern text, and the regex reliably matches actual non-breaking
spaces across platforms and locales.
@mundschenk-at
Copy link
Owner

Thank you for the report, I'll have a closer look in the next few days. (I'd prefer a solution without a magic constant if possible.)

@mundschenk-at mundschenk-at self-assigned this Aug 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants