Skip to content

Incorrect handling of invalid surrogate pair characters

High severity GitHub Reviewed Published Jul 2, 2022 in ultrajson/ultrajson • Updated Jan 27, 2023

Package

pip ujson (pip)

Affected versions

< 5.4.0

Patched versions

5.4.0

Description

Impact

What kind of vulnerability is it? Who is impacted?

Anyone parsing JSON from an untrusted source is vulnerable.

JSON strings that contain escaped surrogate characters not part of a proper surrogate pair were decoded incorrectly. Besides corrupting strings, this allowed for potential key confusion and value overwriting in dictionaries.

Examples:

# An unpaired high surrogate character is ignored.
>>> ujson.loads(r'"\uD800"')
''
>>> ujson.loads(r'"\uD800hello"')
'hello'

# An unpaired low surrogate character is preserved.
>>> ujson.loads(r'"\uDC00"')
'\udc00'

# A pair of surrogates with additional non surrogate characters pair up in spite of being invalid.
>>> ujson.loads(r'"\uD800foo bar\uDC00"')
'foo bar𐀀'

Patches

Has the problem been patched? What versions should users upgrade to?

Users should upgrade to UltraJSON 5.4.0.

From version 5.4.0, UltraJSON decodes lone surrogates in the same way as the standard library's json module does, preserving them in the parsed output:

>>> ujson.loads(r'"\uD800"')
'\ud800'
>>> ujson.loads(r'"\uD800hello"')
'\ud800hello'
>>> ujson.loads(r'"\uDC00"')
'\udc00'
>>> ujson.loads(r'"\uD800foo bar\uDC00"')
'\ud800foo bar\udc00'

Workarounds

Is there a way for users to fix or remediate the vulnerability without upgrading?

Short of switching to an entirely different JSON library, there are no safe alternatives to upgrading.

For more information

If you have any questions or comments about this advisory:

References

@hugovk hugovk published to ultrajson/ultrajson Jul 2, 2022
Published by the National Vulnerability Database Jul 5, 2022
Published to the GitHub Advisory Database Jul 5, 2022
Reviewed Jul 5, 2022
Last updated Jan 27, 2023

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
None
Scope
Unchanged
Confidentiality
None
Integrity
None
Availability
High

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

EPSS score

0.212%
(59th percentile)

Weaknesses

CVE ID

CVE-2022-31116

GHSA ID

GHSA-wpqr-jcpx-745r

Source code

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.