Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Updates
xmldom
to fix an issue where pairs of unicode values represented as XML-encoded character entities are combined into a single value.This comes from an issue filed on an internal project where the signature digest was not being properly calculated for assertion values that use two or more XML-encoded Chinese characters.
The meat of the investigation of that internal ticket is below.
---- Reproduced Text ----
Update to this. I checked out a couple of what I-believe-to-be-authoritative documents about multiple-byte character entities.
According to the W3C QA doc "Using character escapes in markup and CSS":
https://www.w3.org/International/questions/qa-escapes
Note that the case for which the two-byte character entity squashing code was added appears to be the German "LATIN SMALL LETTER U DIAERESIS" also commonly referred to as an "umlaut".
http://www.fileformat.info/info/unicode/char/fc/index.htm
The test case in
auth0/passport-wsfed-saml2
that attempts to validate multi character entity encoding is using the UTF-8 two-byte hexidecimal representationü
.passport-wsfed-saml2/test/interop.tests.js
Lines 192 to 205 in 6d50336
Note that the Unicode Binary and UTF-8 w/o formatting bits generates the same representation 00FC after padding the value with leading zeroes.
From the XML 1.0 specification "4.1 Character and Entity References" https://www.w3.org/TR/xml/#sec-references:
This would appear to indicate that the "code point" is the correct value representation and not the UTF-8 encoded value. This appears to be confirmed by testing using the Chrome browser and HTML.
Based on the spec it seems then that the correct approach is to drop support for two-byte UTF-8 encoded character entity representations entirely.