Unicode XML character entities fix #82

ghost · 2018-04-25T00:55:55Z

Updates xmldom to fix an issue where pairs of unicode values represented as XML-encoded character entities are combined into a single value.

This comes from an issue filed on an internal project where the signature digest was not being properly calculated for assertion values that use two or more XML-encoded Chinese characters.

The meat of the investigation of that internal ticket is below.

---- Reproduced Text ----
Update to this. I checked out a couple of what I-believe-to-be-authoritative documents about multiple-byte character entities.

According to the W3C QA doc "Using character escapes in markup and CSS":
https://www.w3.org/International/questions/qa-escapes

Supplementary characters. Supplementary characters are those Unicode characters beyond the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the BMP. Because of this, or because of experience with older versions of JavaScript syntax, some people think that supplementary characters need to be represented using two escapes, but this is incorrect – you must use the single, code point value for that character. For example, use 𣎴 rather than &#xD84C;&#xDFB4;.

Note that the case for which the two-byte character entity squashing code was added appears to be the German "LATIN SMALL LETTER U DIAERESIS" also commonly referred to as an "umlaut".
http://www.fileformat.info/info/unicode/char/fc/index.htm

The test case in auth0/passport-wsfed-saml2 that attempts to validate multi character entity encoding is using the UTF-8 two-byte hexidecimal representation Ã¼.

passport-wsfed-saml2/test/interop.tests.js

Lines 192 to 205 in 6d50336

	it('should validate an assertion with umlats xmldom', function (done) {
	var signedAssertion = new Parser().parseFromString('<saml2:Assertion xmlns:saml2="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:xs="http://www.w3.org/2001/XMLSchema" ID="_071de65ecb79185206fcb0789e9afd90" IssueInstant="2014-04-06T22:27:04.997Z" Version="2.0"><saml2:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://aai-logon.ethz.ch/idp/shibboleth</saml2:Issuer><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"></ds:CanonicalizationMethod><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"></ds:SignatureMethod><ds:Reference URI="#_071de65ecb79185206fcb0789e9afd90"><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"></ds:Transform><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"><ec:InclusiveNamespaces xmlns:ec="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="xs"></ec:InclusiveNamespaces></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></ds:DigestMethod><ds:DigestValue>jVMwKZ5O3hXfOf6tkVan2hnPW2w=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>nq5nJangoli5J6uBF/sEeYyKL7+xepbsDmjT6mpggLmba6yR+lQaZmAGnti8nhZUPyXwZfZS3d9oH4upbRg56jdVVcPaZUhYOPW2T2etm7lxxaDlHDJo/E40KnBtGMn6Oxz23hXUrc6p6K4FFLCQwmsE3ZZlP/u8DcqKNl5X/D5udcCV75mjxnVKWuXu34Xw4uQEQBb+6UfGjDN1/91M6U3ZZ0iOSRsBC7+SYLVMbDZqGveioKjZMPBuHmoBwQxsCixu1var3LNyCFVRo0LV9qA5DhA5lyH209+kFsN9vqzHKkiOF+Wua+Ngh2oR/48CWfTOjDuvRpje1bICIwwCQg==</ds:SignatureValue><ds:KeyInfo><ds:X509Data><ds:X509Certificate>MIIFjzCCBHegAwIBAgIUZ+QtvaEucMtOcruHlzQrEDH92FMwDQYJKoZIhvcNAQEFBQAwazELMAkG\nA1UEBhMCQk0xGTAXBgNVBAoTEFF1b1ZhZGlzIExpbWl0ZWQxHzAdBgNVBAsTFnd3dy5xdW92YWRp\nc2dsb2JhbC5jb20xIDAeBgNVBAMTF1F1b1ZhZGlzIEdsb2JhbCBTU0wgSUNBMB4XDTEzMDQxNzA4\nMDYwNFoXDTE1MDQxNzA4MDYwNFowYzELMAkGA1UEBhMCQ0gxEDAOBgNVBAgTB1p1ZXJpY2gxEDAO\nBgNVBAcTB1p1ZXJpY2gxFDASBgNVBAoTC0VUSCBadWVyaWNoMRowGAYDVQQDExFhYWktbG9nb24u\nZXRoei5jaDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAOJWLI4vWx5HnqUvkBDm5Egp\nUg8yOlL3HbS0Y62/k77R2W9wxNczcR79wUBl2cNDCF/LxzdY1ml2u2skbZy4tqtmcvHVrwM5RVDb\n3jpjUhzBlD5rkpxgut2zFmNsahXzceD9dzsTvq7MUq6YgW6iRY3wNbes7ZgRtdkCz+vbiB52iTES\nZ2lo6fBn69eiqywUhQ5t/K4jGqpSUf1DITz//lMWRveagVyUq342JONxo93nt6x6ewGg+Qo8yCuC\nj4VehpncHYV0oNI2sSncKPm23Z4TNxPDalSaq8R5nKhueG+FHX7Ks8hWYSf42m2rrZLTumv2Ry8H\nFrPFkI7kuSFwVRECAwEAAaOCAjEwggItMHQGCCsGAQUFBwEBBGgwZjAqBggrBgEFBQcwAYYeaHR0\ncDovL29jc3AucXVvdmFkaXNnbG9iYWwuY29tMDgGCCsGAQUFBzAChixodHRwOi8vdHJ1c3QucXVv\ndmFkaXNnbG9iYWwuY29tL3F2c3NsaWNhLmNydDCBtQYDVR0RBIGtMIGqghFhYWktbG9nb24uZXRo\nei5jaIIPdmNpcGhlci5ldGh6LmNogg92Y2Flc2FyLmV0aHouY2iCD3ZjdXJ0ZXIuZXRoei5jaIIP\ndmNvcHBlci5ldGh6LmNogg92Y2Vuc29yLmV0aHouY2iCEmxkYXBzLWluZm8uZXRoei5jaIIPbGlu\ndGVzdC5ldGh6LmNogRt2bGFkaXNsYXYubmVzcG9yQGlkLmV0aHouY2gwUQYDVR0gBEowSDBGBgwr\nBgEEAb5YAAJkAQEwNjA0BggrBgEFBQcCARYoaHR0cDovL3d3dy5xdW92YWRpc2dsb2JhbC5jb20v\ncmVwb3NpdG9yeTAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC\nMB8GA1UdIwQYMBaAFDJNoU/q8K6Ztu6bByyECBFQi+J+MDsGA1UdHwQ0MDIwMKAuoCyGKmh0dHA6\nLy9jcmwucXVvdmFkaXNnbG9iYWwuY29tL3F2c3NsaWNhLmNybDAdBgNVHQ4EFgQUUrfY5AJdnN5W\n9TTyrVObbQEoH/cwDQYJKoZIhvcNAQEFBQADggEBAJHQIjLbalw9LF9wIjhhOsEsaf/Bd8dSKcb2\nICLC16TyetuTTJfqHqHr3QiAcrSNKOxqoFBX51t7oNyd3n1BGxJeYmpoyKHKmViUF9mJWBKxSvfW\njmYA7M/LptNX+aUz0fPntCokjH5pPAk3n5YYf2gTFOmRbZDdvNxQ0+o5EkRKkxLDAYM7HlJshWfK\nyY8ZKiPSx28ebXORGzW/VC5VunURFPmhvy5hUFo2qFhGhkQZD1Tg5uN+vd7KywgXLiQKWFDweOxY\nkFuTatM9peWNaapAuaYL8D6q/pn6q76cDKiMjTLp1siQsVVzFAZNjywOve5tdqB/Qo7zwX7TggF1\nmrQ=</ds:X509Certificate></ds:X509Data></ds:KeyInfo></ds:Signature><saml2:Subject><saml2:NameID Format="urn:oasis:names:tc:SAML:2.0:nameid-format:transient" NameQualifier="https://aai-logon.ethz.ch/idp/shibboleth">_e132eb870c4a912c56e1bafeb5257b35</saml2:NameID><saml2:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer"><saml2:SubjectConfirmationData Address="80.218.183.64" InResponseTo="_e8cda4c13682e111e66d" NotOnOrAfter="2014-04-06T22:32:04.997Z" Recipient="https://fmi-test.auth0.com/login/callback"></saml2:SubjectConfirmationData></saml2:SubjectConfirmation></saml2:Subject><saml2:Conditions NotBefore="2014-04-06T22:27:04.997Z" NotOnOrAfter="2014-04-06T22:32:04.997Z"><saml2:AudienceRestriction><saml2:Audience>urn:auth0:fmi-test</saml2:Audience></saml2:AudienceRestriction></saml2:Conditions><saml2:AuthnStatement AuthnInstant="2014-04-06T22:27:04.858Z" SessionIndex="_ff0e0b5d9d6706bc22561c49c7eac971"><saml2:SubjectLocality Address="80.218.183.64"></saml2:SubjectLocality><saml2:AuthnContext><saml2:AuthnContextClassRef>urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport</saml2:AuthnContextClassRef></saml2:AuthnContext></saml2:AuthnStatement><saml2:AttributeStatement><saml2:Attribute FriendlyName="eduPersonAffiliation" Name="urn:oid:1.3.6.1.4.1.5923.1.1.1.1" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">member</saml2:AttributeValue><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">staff</saml2:AttributeValue><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">student</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="sn" Name="urn:oid:2.5.4.4" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">GnÃ¼gge</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="givenName" Name="urn:oid:2.5.4.42" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">Robert</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonHomeOrganization" Name="urn:oid:2.16.756.1.2.5.1.1.4" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">ethz.ch</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonUniqueID" Name="urn:oid:2.16.756.1.2.5.1.1.1" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">[email protected]</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonHomeOrganizationType" Name="urn:oid:2.16.756.1.2.5.1.1.5" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">university</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="eduPersonTargetedID" Name="urn:oid:1.3.6.1.4.1.5923.1.1.1.10" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue><saml2:NameID Format="urn:oasis:names:tc:SAML:2.0:nameid-format:persistent" NameQualifier="https://aai-logon.ethz.ch/idp/shibboleth" SPNameQualifier="urn:auth0:fmi-test">37J7PjSu8hkThPDMZOfZLtca0Ag=</saml2:NameID></saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="mail" Name="urn:oid:0.9.2342.19200300.100.1.3" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">[email protected]</saml2:AttributeValue></saml2:Attribute></saml2:AttributeStatement></saml2:Assertion>');

	var saml_passport = new SamlPassport({thumbprints: ['42FA24A83E107F6842E05D2A2CA0A0A0CA8A2031'],
	realm: 'urn:auth0:fmi-test',
	recipientUrl: 'https://fmi-test.auth0.com/login/callback',
	checkExpiration: false}); // dont check expiration since we are harcoding the token
	var profile = saml_passport.validateSamlAssertion(signedAssertion, function(err, profile) {
	if (err) return done(err);
	assert.ok(profile);
	done();
	});

	});

C3 BC (UTF-8 Hex)
1100 0011 1011 1100 (UTF-8 Binary)
xxx0 0011 xx11 1100 (UTF-8 Binary w/o formatting bits)

00 FC (Unicode Code Point Hex)
0000 0000 1111 1100 (Unicode Code Point Binary)

Note that the Unicode Binary and UTF-8 w/o formatting bits generates the same representation 00FC after padding the value with leading zeroes.

From the XML 1.0 specification "4.1 Character and Entity References" https://www.w3.org/TR/xml/#sec-references:

If the character reference begins with &#x, the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with &#, the digits up to the terminating ; provide a decimal representation of the character's code point.

This would appear to indicate that the "code point" is the correct value representation and not the UTF-8 encoded value. This appears to be confirmed by testing using the Chrome browser and HTML.

<meta charset="UTF-8">
<html>
    <body>&#xC3;&#xBC;</body>
</html>

<meta charset="UTF-8">
<html>
  <body>&#x00FC;</body>
</html>

Based on the spec it seems then that the correct approach is to drop support for two-byte UTF-8 encoded character entity representations entirely.

ghost · 2018-04-25T00:58:56Z

Note that the following change is a potentially breaking change which is why the major release version has been updated.

ziluvatar

Code is fine. I think we might need to integrate the change first and if we consider it stable enough we can merge and create the major version.
Summary: I'll leave the PR open for now.

…de character entity pair smashing.

…rsion includes an update to the xmldom dependency that will break an incorrect behavior where pairs of unicode XML character entities were combined into a single value.

ghost · 2018-08-20T14:54:43Z

@robinbijlani @machuga Do one of you know who owns this work now?

robinbijlani · 2018-08-21T20:52:36Z

Hi @mikeops . Yes, Apollo will be handling continuing this work.

ghost requested a review from ziluvatar April 25, 2018 00:58

ghost self-assigned this Apr 25, 2018

ghost added bug release labels Apr 25, 2018

ziluvatar suggested changes Apr 25, 2018

View reviewed changes

Mike Lee and others added 3 commits June 20, 2018 15:53

Updated the tests and added the xmldom package that removes the unico…

58696db

…de character entity pair smashing.

Update the version number to indicate a major version change. This ve…

0ad84a9

…rsion includes an update to the xmldom dependency that will break an incorrect behavior where pairs of unicode XML character entities were combined into a single value.

fix test

c3f7132

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unicode XML character entities fix #82

Unicode XML character entities fix #82

ghost commented Apr 25, 2018 •

edited by ghost

Loading

ghost commented Apr 25, 2018

ziluvatar left a comment

ghost commented Aug 20, 2018

robinbijlani commented Aug 21, 2018

Unicode XML character entities fix #82

Are you sure you want to change the base?

Unicode XML character entities fix #82

Conversation

ghost commented Apr 25, 2018 • edited by ghost Loading

ghost commented Apr 25, 2018

ziluvatar left a comment

Choose a reason for hiding this comment

ghost commented Aug 20, 2018

robinbijlani commented Aug 21, 2018

ghost commented Apr 25, 2018 •

edited by ghost

Loading