Skip to content

Commit 0a2059a

Browse files
authored
Improve Binary Encoding testing+docs. NFC (#25704)
1. Add test for all byte pairs on binary encoding, and 2. Improve binary encoding settings docs to mention that UTF-8 encoding is now needed, and 3. Add a note to ChangeLog to highlight this requirement. Addresses https://groups.google.com/g/emscripten-discuss/c/E_HmYqXGjN8
1 parent b1fc368 commit 0a2059a

File tree

4 files changed

+63
-1
lines changed

4 files changed

+63
-1
lines changed

ChangeLog.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ See docs/process.md for more on how version tagging works.
4242
are used via `--use-port=emdawnwebgpu`. See 4.0.10 release notes for details.
4343
- A new `CROSS_ORIGIN` setting was added in order to work around issues hosting
4444
emscripten programs across different origins (#25581)
45+
- The binary data encoding for `SINGLE_FILE` mode was changed from base64 to
46+
directly embed binary data into UTF-8 string. Users who use the `SINGLE_FILE`
47+
mode along with a custom HTML file should declare the files to have UTF-8
48+
encoding. See `src/settings.js` docs on `SINGLE_FILE`. Use the option
49+
`-sSINGLE_FILE_BINARY_ENCODE=0` to fall back to base64 encoding. (#25599)
4550

4651
4.0.17 - 10/17/25
4752
-----------------

site/source/docs/tools_reference/settings_reference.rst

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2837,6 +2837,9 @@ child-src directive to allow blob:. If you aren't using Content Security
28372837
Policy, or your CSP header doesn't include either script-src or child-src,
28382838
then you can safely ignore this warning.
28392839

2840+
Note that SINGLE_FILE with binary encoding requires the HTML/JS files to be
2841+
served with UTF-8 encoding. See the details on SINGLE_FILE_BINARY_ENCODE.
2842+
28402843
Default value: false
28412844

28422845
.. _single_file_binary_encode:
@@ -2851,6 +2854,20 @@ issues with the binary encoding. (and please let us know of any such issues)
28512854
If no issues arise, this option will permanently become the default in the
28522855
future.
28532856

2857+
NOTE: Binary encoding requires that the HTML/JS files are served with UTF-8
2858+
encoding, and will not work with the default legacy Windows-1252 encoding
2859+
that browsers might use on Windows. To enable UTF-8 encoding in a
2860+
hand-crafted index.html file, apply any of:
2861+
1. Add `<meta charset="utf-8">` inside the <head> section of HTML, or
2862+
2. Add `<meta http-equiv="content-type" content="text/html; charset=UTF-8" />`` inside <head>, or
2863+
3. Add `<meta http-equiv="content-type" content="application/json; charset=utf-8" />` inside <head>
2864+
(if using -o foo.js with SINGLE_FILE mode to build HTML+JS), or
2865+
4. pass the header `Content-Type: text/html; charset=utf-8` and/or header
2866+
`Content-Type: application/javascript; charset=utf-8` when serving the
2867+
relevant files that contain binary encoded content.
2868+
If none of these are possible, disable binary encoding with
2869+
-sSINGLE_FILE_BINARY_ENCODE=0 to fall back to base64 encoding.
2870+
28542871
Default value: true
28552872

28562873
.. _auto_js_libraries:

src/settings.js

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1852,6 +1852,9 @@ var WASMFS = false;
18521852
// child-src directive to allow blob:. If you aren't using Content Security
18531853
// Policy, or your CSP header doesn't include either script-src or child-src,
18541854
// then you can safely ignore this warning.
1855+
//
1856+
// Note that SINGLE_FILE with binary encoding requires the HTML/JS files to be
1857+
// served with UTF-8 encoding. See the details on SINGLE_FILE_BINARY_ENCODE.
18551858
// [link]
18561859
var SINGLE_FILE = false;
18571860

@@ -1861,6 +1864,20 @@ var SINGLE_FILE = false;
18611864
// issues with the binary encoding. (and please let us know of any such issues)
18621865
// If no issues arise, this option will permanently become the default in the
18631866
// future.
1867+
//
1868+
// NOTE: Binary encoding requires that the HTML/JS files are served with UTF-8
1869+
// encoding, and will not work with the default legacy Windows-1252 encoding
1870+
// that browsers might use on Windows. To enable UTF-8 encoding in a
1871+
// hand-crafted index.html file, apply any of:
1872+
// 1. Add `<meta charset="utf-8">` inside the <head> section of HTML, or
1873+
// 2. Add `<meta http-equiv="content-type" content="text/html; charset=UTF-8" />`` inside <head>, or
1874+
// 3. Add `<meta http-equiv="content-type" content="application/json; charset=utf-8" />` inside <head>
1875+
// (if using -o foo.js with SINGLE_FILE mode to build HTML+JS), or
1876+
// 4. pass the header `Content-Type: text/html; charset=utf-8` and/or header
1877+
// `Content-Type: application/javascript; charset=utf-8` when serving the
1878+
// relevant files that contain binary encoded content.
1879+
// If none of these are possible, disable binary encoding with
1880+
// -sSINGLE_FILE_BINARY_ENCODE=0 to fall back to base64 encoding.
18641881
// [link]
18651882
var SINGLE_FILE_BINARY_ENCODE = true;
18661883

test/test_other.py

Lines changed: 24 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
import select
1818
import shlex
1919
import shutil
20+
import struct
2021
import subprocess
2122
import sys
2223
import tarfile
@@ -89,6 +90,7 @@
8990

9091
from tools import building, cache, response_file, shared, utils, webassembly
9192
from tools.building import get_building_env
93+
from tools.link import binary_encode
9294
from tools.settings import settings
9395
from tools.shared import (
9496
CLANG_CC,
@@ -106,7 +108,15 @@
106108
config,
107109
)
108110
from tools.system_libs import DETERMINISTIC_PREFIX
109-
from tools.utils import MACOS, WINDOWS, delete_file, read_binary, read_file, write_file
111+
from tools.utils import (
112+
MACOS,
113+
WINDOWS,
114+
delete_file,
115+
read_binary,
116+
read_file,
117+
write_binary,
118+
write_file,
119+
)
110120

111121
emmake = utils.bat_suffix(path_from_root('emmake'))
112122
emconfig = utils.bat_suffix(path_from_root('em-config'))
@@ -15175,3 +15185,16 @@ def test_linkable_relocatable(self):
1517515185
# These setting is due for removal:
1517615186
# https://github.com/emscripten-core/emscripten/issues/25262
1517715187
self.do_run_in_out_file_test('hello_world.c', cflags=['-Wno-deprecated', '-sLINKABLE', '-sRELOCATABLE'])
15188+
15189+
# Tests encoding of all byte pairs for binary encoding in SINGLE_FILE mode.
15190+
def test_binary_encode(self):
15191+
# Encode values 0 .. 65535 into test data
15192+
test_data = bytearray(struct.pack('<' + 'H' * 65536, *range(65536)))
15193+
write_binary('data.tmp', test_data)
15194+
binary_encoded = binary_encode('data.tmp')
15195+
test_js = '''var u16 = new Uint16Array(binaryDecode(src).buffer);
15196+
for(var i = 0; i < 65536; ++i)
15197+
if (u16[i] != i) throw i;
15198+
console.log('OK');'''
15199+
write_file('test.js', read_file(path_from_root('src/binaryDecode.js')) + '\nvar src = ' + binary_encoded + ';\n' + test_js)
15200+
self.assertContained('OK', self.run_js('test.js'))

0 commit comments

Comments
 (0)