Skip to content

Commit 7e90bac

Browse files
gh-140793: Improve documentatation and tests for the ensure_ascii option in the json module (GH-140906)
* Document that ensure_ascii=True forces escaping not only non-ASCII, but also non-printable characters (the only affected ASCII character is U+007F). * Ensure that the help output for the json module does not exceed 80 columns (except one long line in an example and generated lines). * Add more tests.
1 parent 8cec3d3 commit 7e90bac

File tree

6 files changed

+89
-43
lines changed

6 files changed

+89
-43
lines changed

Doc/library/json.rst

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -183,8 +183,10 @@ Basic Usage
183183

184184
:param bool ensure_ascii:
185185
If ``True`` (the default), the output is guaranteed to
186-
have all incoming non-ASCII characters escaped.
187-
If ``False``, these characters will be outputted as-is.
186+
have all incoming non-ASCII and non-printable characters escaped.
187+
If ``False``, all characters will be outputted as-is, except for
188+
the characters that must be escaped: quotation mark, reverse solidus,
189+
and the control characters U+0000 through U+001F.
188190

189191
:param bool check_circular:
190192
If ``False``, the circular reference check for container types is skipped
@@ -495,8 +497,10 @@ Encoders and Decoders
495497
:class:`bool` or ``None``. If *skipkeys* is true, such items are simply skipped.
496498

497499
If *ensure_ascii* is true (the default), the output is guaranteed to
498-
have all incoming non-ASCII characters escaped. If *ensure_ascii* is
499-
false, these characters will be output as-is.
500+
have all incoming non-ASCII and non-printable characters escaped.
501+
If *ensure_ascii* is false, all characters will be output as-is, except for
502+
the characters that must be escaped: quotation mark, reverse solidus,
503+
and the control characters U+0000 through U+001F.
500504

501505
If *check_circular* is true (the default), then lists, dicts, and custom
502506
encoded objects will be checked for circular references during encoding to
@@ -636,7 +640,7 @@ UTF-32, with UTF-8 being the recommended default for maximum interoperability.
636640

637641
As permitted, though not required, by the RFC, this module's serializer sets
638642
*ensure_ascii=True* by default, thus escaping the output so that the resulting
639-
strings only contain ASCII characters.
643+
strings only contain printable ASCII characters.
640644

641645
Other than the *ensure_ascii* parameter, this module is defined strictly in
642646
terms of conversion between Python objects and

Lib/json/__init__.py

Lines changed: 29 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -127,8 +127,9 @@ def dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,
127127
instead of raising a ``TypeError``.
128128
129129
If ``ensure_ascii`` is false, then the strings written to ``fp`` can
130-
contain non-ASCII characters if they appear in strings contained in
131-
``obj``. Otherwise, all such characters are escaped in JSON strings.
130+
contain non-ASCII and non-printable characters if they appear in strings
131+
contained in ``obj``. Otherwise, all such characters are escaped in JSON
132+
strings.
132133
133134
If ``check_circular`` is false, then the circular reference check
134135
for container types will be skipped and a circular reference will
@@ -144,10 +145,11 @@ def dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True,
144145
level of 0 will only insert newlines. ``None`` is the most compact
145146
representation.
146147
147-
If specified, ``separators`` should be an ``(item_separator, key_separator)``
148-
tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and
149-
``(',', ': ')`` otherwise. To get the most compact JSON representation,
150-
you should specify ``(',', ':')`` to eliminate whitespace.
148+
If specified, ``separators`` should be an ``(item_separator,
149+
key_separator)`` tuple. The default is ``(', ', ': ')`` if *indent* is
150+
``None`` and ``(',', ': ')`` otherwise. To get the most compact JSON
151+
representation, you should specify ``(',', ':')`` to eliminate
152+
whitespace.
151153
152154
``default(obj)`` is a function that should return a serializable version
153155
of obj or raise TypeError. The default simply raises TypeError.
@@ -188,9 +190,10 @@ def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True,
188190
(``str``, ``int``, ``float``, ``bool``, ``None``) will be skipped
189191
instead of raising a ``TypeError``.
190192
191-
If ``ensure_ascii`` is false, then the return value can contain non-ASCII
192-
characters if they appear in strings contained in ``obj``. Otherwise, all
193-
such characters are escaped in JSON strings.
193+
If ``ensure_ascii`` is false, then the return value can contain
194+
non-ASCII and non-printable characters if they appear in strings
195+
contained in ``obj``. Otherwise, all such characters are escaped in
196+
JSON strings.
194197
195198
If ``check_circular`` is false, then the circular reference check
196199
for container types will be skipped and a circular reference will
@@ -206,10 +209,11 @@ def dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True,
206209
level of 0 will only insert newlines. ``None`` is the most compact
207210
representation.
208211
209-
If specified, ``separators`` should be an ``(item_separator, key_separator)``
210-
tuple. The default is ``(', ', ': ')`` if *indent* is ``None`` and
211-
``(',', ': ')`` otherwise. To get the most compact JSON representation,
212-
you should specify ``(',', ':')`` to eliminate whitespace.
212+
If specified, ``separators`` should be an ``(item_separator,
213+
key_separator)`` tuple. The default is ``(', ', ': ')`` if *indent* is
214+
``None`` and ``(',', ': ')`` otherwise. To get the most compact JSON
215+
representation, you should specify ``(',', ':')`` to eliminate
216+
whitespace.
213217
214218
``default(obj)`` is a function that should return a serializable version
215219
of obj or raise TypeError. The default simply raises TypeError.
@@ -280,11 +284,12 @@ def load(fp, *, cls=None, object_hook=None, parse_float=None,
280284
``object_hook`` will be used instead of the ``dict``. This feature
281285
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
282286
283-
``object_pairs_hook`` is an optional function that will be called with the
284-
result of any object literal decoded with an ordered list of pairs. The
285-
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
286-
This feature can be used to implement custom decoders. If ``object_hook``
287-
is also defined, the ``object_pairs_hook`` takes priority.
287+
``object_pairs_hook`` is an optional function that will be called with
288+
the result of any object literal decoded with an ordered list of pairs.
289+
The return value of ``object_pairs_hook`` will be used instead of the
290+
``dict``. This feature can be used to implement custom decoders. If
291+
``object_hook`` is also defined, the ``object_pairs_hook`` takes
292+
priority.
288293
289294
To use a custom ``JSONDecoder`` subclass, specify it with the ``cls``
290295
kwarg; otherwise ``JSONDecoder`` is used.
@@ -305,11 +310,12 @@ def loads(s, *, cls=None, object_hook=None, parse_float=None,
305310
``object_hook`` will be used instead of the ``dict``. This feature
306311
can be used to implement custom decoders (e.g. JSON-RPC class hinting).
307312
308-
``object_pairs_hook`` is an optional function that will be called with the
309-
result of any object literal decoded with an ordered list of pairs. The
310-
return value of ``object_pairs_hook`` will be used instead of the ``dict``.
311-
This feature can be used to implement custom decoders. If ``object_hook``
312-
is also defined, the ``object_pairs_hook`` takes priority.
313+
``object_pairs_hook`` is an optional function that will be called with
314+
the result of any object literal decoded with an ordered list of pairs.
315+
The return value of ``object_pairs_hook`` will be used instead of the
316+
``dict``. This feature can be used to implement custom decoders. If
317+
``object_hook`` is also defined, the ``object_pairs_hook`` takes
318+
priority.
313319
314320
``parse_float``, if specified, will be called with the string
315321
of every JSON float to be decoded. By default this is equivalent to

Lib/json/decoder.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -297,10 +297,10 @@ def __init__(self, *, object_hook=None, parse_float=None,
297297
place of the given ``dict``. This can be used to provide custom
298298
deserializations (e.g. to support JSON-RPC class hinting).
299299
300-
``object_pairs_hook``, if specified will be called with the result of
301-
every JSON object decoded with an ordered list of pairs. The return
302-
value of ``object_pairs_hook`` will be used instead of the ``dict``.
303-
This feature can be used to implement custom decoders.
300+
``object_pairs_hook``, if specified will be called with the result
301+
of every JSON object decoded with an ordered list of pairs. The
302+
return value of ``object_pairs_hook`` will be used instead of the
303+
``dict``. This feature can be used to implement custom decoders.
304304
If ``object_hook`` is also defined, the ``object_pairs_hook`` takes
305305
priority.
306306

Lib/json/encoder.py

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -111,9 +111,10 @@ def __init__(self, *, skipkeys=False, ensure_ascii=True,
111111
encoding of keys that are not str, int, float, bool or None.
112112
If skipkeys is True, such items are simply skipped.
113113
114-
If ensure_ascii is true, the output is guaranteed to be str
115-
objects with all incoming non-ASCII characters escaped. If
116-
ensure_ascii is false, the output can contain non-ASCII characters.
114+
If ensure_ascii is true, the output is guaranteed to be str objects
115+
with all incoming non-ASCII and non-printable characters escaped.
116+
If ensure_ascii is false, the output can contain non-ASCII and
117+
non-printable characters.
117118
118119
If check_circular is true, then lists, dicts, and custom encoded
119120
objects will be checked for circular references during encoding to
@@ -134,14 +135,15 @@ def __init__(self, *, skipkeys=False, ensure_ascii=True,
134135
indent level. An indent level of 0 will only insert newlines.
135136
None is the most compact representation.
136137
137-
If specified, separators should be an (item_separator, key_separator)
138-
tuple. The default is (', ', ': ') if *indent* is ``None`` and
139-
(',', ': ') otherwise. To get the most compact JSON representation,
140-
you should specify (',', ':') to eliminate whitespace.
138+
If specified, separators should be an (item_separator,
139+
key_separator) tuple. The default is (', ', ': ') if *indent* is
140+
``None`` and (',', ': ') otherwise. To get the most compact JSON
141+
representation, you should specify (',', ':') to eliminate
142+
whitespace.
141143
142144
If specified, default is a function that gets called for objects
143-
that can't otherwise be serialized. It should return a JSON encodable
144-
version of the object or raise a ``TypeError``.
145+
that can't otherwise be serialized. It should return a JSON
146+
encodable version of the object or raise a ``TypeError``.
145147
146148
"""
147149

Lib/test/test_json/test_encode_basestring_ascii.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,12 @@
88
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', '"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
99
('controls', '"controls"'),
1010
('\x08\x0c\n\r\t', '"\\b\\f\\n\\r\\t"'),
11+
('\x00\x1f\x7f', '"\\u0000\\u001f\\u007f"'),
1112
('{"object with 1 member":["array with 1 element"]}', '"{\\"object with 1 member\\":[\\"array with 1 element\\"]}"'),
1213
(' s p a c e d ', '" s p a c e d "'),
1314
('\U0001d120', '"\\ud834\\udd20"'),
1415
('\u03b1\u03a9', '"\\u03b1\\u03a9"'),
1516
("`1~!@#$%^&*()_+-={':[,]}|;.</>?", '"`1~!@#$%^&*()_+-={\':[,]}|;.</>?"'),
16-
('\x08\x0c\n\r\t', '"\\b\\f\\n\\r\\t"'),
17-
('\u0123\u4567\u89ab\ucdef\uabcd\uef4a', '"\\u0123\\u4567\\u89ab\\ucdef\\uabcd\\uef4a"'),
1817
]
1918

2019
class TestEncodeBasestringAscii:

Lib/test/test_json/test_unicode.py

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,29 @@ def test_encoding7(self):
3232
j = self.dumps(u + "\n", ensure_ascii=False)
3333
self.assertEqual(j, f'"{u}\\n"')
3434

35+
def test_ascii_non_printable_encode(self):
36+
u = '\b\t\n\f\r\x00\x1f\x7f'
37+
self.assertEqual(self.dumps(u),
38+
'"\\b\\t\\n\\f\\r\\u0000\\u001f\\u007f"')
39+
self.assertEqual(self.dumps(u, ensure_ascii=False),
40+
'"\\b\\t\\n\\f\\r\\u0000\\u001f\x7f"')
41+
42+
def test_ascii_non_printable_decode(self):
43+
self.assertEqual(self.loads('"\\b\\t\\n\\f\\r"'),
44+
'\b\t\n\f\r')
45+
s = ''.join(map(chr, range(32)))
46+
for c in s:
47+
self.assertRaises(self.JSONDecodeError, self.loads, f'"{c}"')
48+
self.assertEqual(self.loads(f'"{s}"', strict=False), s)
49+
self.assertEqual(self.loads('"\x7f"'), '\x7f')
50+
51+
def test_escaped_decode(self):
52+
self.assertEqual(self.loads('"\\b\\t\\n\\f\\r"'), '\b\t\n\f\r')
53+
self.assertEqual(self.loads('"\\"\\\\\\/"'), '"\\/')
54+
for c in set(map(chr, range(0x100))) - set('"\\/bfnrt'):
55+
self.assertRaises(self.JSONDecodeError, self.loads, f'"\\{c}"')
56+
self.assertRaises(self.JSONDecodeError, self.loads, f'"\\{c}"', strict=False)
57+
3558
def test_big_unicode_encode(self):
3659
u = '\U0001d120'
3760
self.assertEqual(self.dumps(u), '"\\ud834\\udd20"')
@@ -48,6 +71,18 @@ def test_unicode_decode(self):
4871
s = f'"\\u{i:04x}"'
4972
self.assertEqual(self.loads(s), u)
5073

74+
def test_single_surrogate_encode(self):
75+
self.assertEqual(self.dumps('\uD83D'), '"\\ud83d"')
76+
self.assertEqual(self.dumps('\uD83D', ensure_ascii=False), '"\ud83d"')
77+
self.assertEqual(self.dumps('\uDC0D'), '"\\udc0d"')
78+
self.assertEqual(self.dumps('\uDC0D', ensure_ascii=False), '"\udc0d"')
79+
80+
def test_single_surrogate_decode(self):
81+
self.assertEqual(self.loads('"\uD83D"'), '\ud83d')
82+
self.assertEqual(self.loads('"\\uD83D"'), '\ud83d')
83+
self.assertEqual(self.loads('"\udc0d"'), '\udc0d')
84+
self.assertEqual(self.loads('"\\udc0d"'), '\udc0d')
85+
5186
def test_unicode_preservation(self):
5287
self.assertEqual(type(self.loads('""')), str)
5388
self.assertEqual(type(self.loads('"a"')), str)

0 commit comments

Comments
 (0)