Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode XML character entities fix #82

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Unicode XML character entities fix #82

wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 25, 2018

Updates xmldom to fix an issue where pairs of unicode values represented as XML-encoded character entities are combined into a single value.

This comes from an issue filed on an internal project where the signature digest was not being properly calculated for assertion values that use two or more XML-encoded Chinese characters.

The meat of the investigation of that internal ticket is below.

---- Reproduced Text ----
Update to this. I checked out a couple of what I-believe-to-be-authoritative documents about multiple-byte character entities.

According to the W3C QA doc "Using character escapes in markup and CSS":
https://www.w3.org/International/questions/qa-escapes

Supplementary characters. Supplementary characters are those Unicode characters beyond the Basic Multilingual Plane (BMP). In UTF-16 a supplementary character is encoded using two 16-bit surrogate code points from the BMP. Because of this, or because of experience with older versions of JavaScript syntax, some people think that supplementary characters need to be represented using two escapes, but this is incorrect – you must use the single, code point value for that character. For example, use 𣎴 rather than ��.

Note that the case for which the two-byte character entity squashing code was added appears to be the German "LATIN SMALL LETTER U DIAERESIS" also commonly referred to as an "umlaut".
http://www.fileformat.info/info/unicode/char/fc/index.htm

The test case in auth0/passport-wsfed-saml2 that attempts to validate multi character entity encoding is using the UTF-8 two-byte hexidecimal representation ü.

it('should validate an assertion with umlats xmldom', function (done) {
var signedAssertion = new Parser().parseFromString('<saml2:Assertion xmlns:saml2="urn:oasis:names:tc:SAML:2.0:assertion" xmlns:xs="http://www.w3.org/2001/XMLSchema" ID="_071de65ecb79185206fcb0789e9afd90" IssueInstant="2014-04-06T22:27:04.997Z" Version="2.0"><saml2:Issuer Format="urn:oasis:names:tc:SAML:2.0:nameid-format:entity">https://aai-logon.ethz.ch/idp/shibboleth</saml2:Issuer><ds:Signature xmlns:ds="http://www.w3.org/2000/09/xmldsig#"><ds:SignedInfo><ds:CanonicalizationMethod Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"></ds:CanonicalizationMethod><ds:SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"></ds:SignatureMethod><ds:Reference URI="#_071de65ecb79185206fcb0789e9afd90"><ds:Transforms><ds:Transform Algorithm="http://www.w3.org/2000/09/xmldsig#enveloped-signature"></ds:Transform><ds:Transform Algorithm="http://www.w3.org/2001/10/xml-exc-c14n#"><ec:InclusiveNamespaces xmlns:ec="http://www.w3.org/2001/10/xml-exc-c14n#" PrefixList="xs"></ec:InclusiveNamespaces></ds:Transform></ds:Transforms><ds:DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></ds:DigestMethod><ds:DigestValue>jVMwKZ5O3hXfOf6tkVan2hnPW2w=</ds:DigestValue></ds:Reference></ds:SignedInfo><ds:SignatureValue>nq5nJangoli5J6uBF/sEeYyKL7+xepbsDmjT6mpggLmba6yR+lQaZmAGnti8nhZUPyXwZfZS3d9oH4upbRg56jdVVcPaZUhYOPW2T2etm7lxxaDlHDJo/E40KnBtGMn6Oxz23hXUrc6p6K4FFLCQwmsE3ZZlP/u8DcqKNl5X/D5udcCV75mjxnVKWuXu34Xw4uQEQBb+6UfGjDN1/91M6U3ZZ0iOSRsBC7+SYLVMbDZqGveioKjZMPBuHmoBwQxsCixu1var3LNyCFVRo0LV9qA5DhA5lyH209+kFsN9vqzHKkiOF+Wua+Ngh2oR/48CWfTOjDuvRpje1bICIwwCQg==</ds:SignatureValue><ds:KeyInfo><ds:X509Data><ds:X509Certificate>MIIFjzCCBHegAwIBAgIUZ+QtvaEucMtOcruHlzQrEDH92FMwDQYJKoZIhvcNAQEFBQAwazELMAkG\nA1UEBhMCQk0xGTAXBgNVBAoTEFF1b1ZhZGlzIExpbWl0ZWQxHzAdBgNVBAsTFnd3dy5xdW92YWRp\nc2dsb2JhbC5jb20xIDAeBgNVBAMTF1F1b1ZhZGlzIEdsb2JhbCBTU0wgSUNBMB4XDTEzMDQxNzA4\nMDYwNFoXDTE1MDQxNzA4MDYwNFowYzELMAkGA1UEBhMCQ0gxEDAOBgNVBAgTB1p1ZXJpY2gxEDAO\nBgNVBAcTB1p1ZXJpY2gxFDASBgNVBAoTC0VUSCBadWVyaWNoMRowGAYDVQQDExFhYWktbG9nb24u\nZXRoei5jaDCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAOJWLI4vWx5HnqUvkBDm5Egp\nUg8yOlL3HbS0Y62/k77R2W9wxNczcR79wUBl2cNDCF/LxzdY1ml2u2skbZy4tqtmcvHVrwM5RVDb\n3jpjUhzBlD5rkpxgut2zFmNsahXzceD9dzsTvq7MUq6YgW6iRY3wNbes7ZgRtdkCz+vbiB52iTES\nZ2lo6fBn69eiqywUhQ5t/K4jGqpSUf1DITz//lMWRveagVyUq342JONxo93nt6x6ewGg+Qo8yCuC\nj4VehpncHYV0oNI2sSncKPm23Z4TNxPDalSaq8R5nKhueG+FHX7Ks8hWYSf42m2rrZLTumv2Ry8H\nFrPFkI7kuSFwVRECAwEAAaOCAjEwggItMHQGCCsGAQUFBwEBBGgwZjAqBggrBgEFBQcwAYYeaHR0\ncDovL29jc3AucXVvdmFkaXNnbG9iYWwuY29tMDgGCCsGAQUFBzAChixodHRwOi8vdHJ1c3QucXVv\ndmFkaXNnbG9iYWwuY29tL3F2c3NsaWNhLmNydDCBtQYDVR0RBIGtMIGqghFhYWktbG9nb24uZXRo\nei5jaIIPdmNpcGhlci5ldGh6LmNogg92Y2Flc2FyLmV0aHouY2iCD3ZjdXJ0ZXIuZXRoei5jaIIP\ndmNvcHBlci5ldGh6LmNogg92Y2Vuc29yLmV0aHouY2iCEmxkYXBzLWluZm8uZXRoei5jaIIPbGlu\ndGVzdC5ldGh6LmNogRt2bGFkaXNsYXYubmVzcG9yQGlkLmV0aHouY2gwUQYDVR0gBEowSDBGBgwr\nBgEEAb5YAAJkAQEwNjA0BggrBgEFBQcCARYoaHR0cDovL3d3dy5xdW92YWRpc2dsb2JhbC5jb20v\ncmVwb3NpdG9yeTAOBgNVHQ8BAf8EBAMCBaAwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsGAQUFBwMC\nMB8GA1UdIwQYMBaAFDJNoU/q8K6Ztu6bByyECBFQi+J+MDsGA1UdHwQ0MDIwMKAuoCyGKmh0dHA6\nLy9jcmwucXVvdmFkaXNnbG9iYWwuY29tL3F2c3NsaWNhLmNybDAdBgNVHQ4EFgQUUrfY5AJdnN5W\n9TTyrVObbQEoH/cwDQYJKoZIhvcNAQEFBQADggEBAJHQIjLbalw9LF9wIjhhOsEsaf/Bd8dSKcb2\nICLC16TyetuTTJfqHqHr3QiAcrSNKOxqoFBX51t7oNyd3n1BGxJeYmpoyKHKmViUF9mJWBKxSvfW\njmYA7M/LptNX+aUz0fPntCokjH5pPAk3n5YYf2gTFOmRbZDdvNxQ0+o5EkRKkxLDAYM7HlJshWfK\nyY8ZKiPSx28ebXORGzW/VC5VunURFPmhvy5hUFo2qFhGhkQZD1Tg5uN+vd7KywgXLiQKWFDweOxY\nkFuTatM9peWNaapAuaYL8D6q/pn6q76cDKiMjTLp1siQsVVzFAZNjywOve5tdqB/Qo7zwX7TggF1\nmrQ=</ds:X509Certificate></ds:X509Data></ds:KeyInfo></ds:Signature><saml2:Subject><saml2:NameID Format="urn:oasis:names:tc:SAML:2.0:nameid-format:transient" NameQualifier="https://aai-logon.ethz.ch/idp/shibboleth">_e132eb870c4a912c56e1bafeb5257b35</saml2:NameID><saml2:SubjectConfirmation Method="urn:oasis:names:tc:SAML:2.0:cm:bearer"><saml2:SubjectConfirmationData Address="80.218.183.64" InResponseTo="_e8cda4c13682e111e66d" NotOnOrAfter="2014-04-06T22:32:04.997Z" Recipient="https://fmi-test.auth0.com/login/callback"></saml2:SubjectConfirmationData></saml2:SubjectConfirmation></saml2:Subject><saml2:Conditions NotBefore="2014-04-06T22:27:04.997Z" NotOnOrAfter="2014-04-06T22:32:04.997Z"><saml2:AudienceRestriction><saml2:Audience>urn:auth0:fmi-test</saml2:Audience></saml2:AudienceRestriction></saml2:Conditions><saml2:AuthnStatement AuthnInstant="2014-04-06T22:27:04.858Z" SessionIndex="_ff0e0b5d9d6706bc22561c49c7eac971"><saml2:SubjectLocality Address="80.218.183.64"></saml2:SubjectLocality><saml2:AuthnContext><saml2:AuthnContextClassRef>urn:oasis:names:tc:SAML:2.0:ac:classes:PasswordProtectedTransport</saml2:AuthnContextClassRef></saml2:AuthnContext></saml2:AuthnStatement><saml2:AttributeStatement><saml2:Attribute FriendlyName="eduPersonAffiliation" Name="urn:oid:1.3.6.1.4.1.5923.1.1.1.1" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">member</saml2:AttributeValue><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">staff</saml2:AttributeValue><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">student</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="sn" Name="urn:oid:2.5.4.4" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">Gn&#xC3;&#xBC;gge</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="givenName" Name="urn:oid:2.5.4.42" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">Robert</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonHomeOrganization" Name="urn:oid:2.16.756.1.2.5.1.1.4" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">ethz.ch</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonUniqueID" Name="urn:oid:2.16.756.1.2.5.1.1.1" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">[email protected]</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="swissEduPersonHomeOrganizationType" Name="urn:oid:2.16.756.1.2.5.1.1.5" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">university</saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="eduPersonTargetedID" Name="urn:oid:1.3.6.1.4.1.5923.1.1.1.10" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue><saml2:NameID Format="urn:oasis:names:tc:SAML:2.0:nameid-format:persistent" NameQualifier="https://aai-logon.ethz.ch/idp/shibboleth" SPNameQualifier="urn:auth0:fmi-test">37J7PjSu8hkThPDMZOfZLtca0Ag=</saml2:NameID></saml2:AttributeValue></saml2:Attribute><saml2:Attribute FriendlyName="mail" Name="urn:oid:0.9.2342.19200300.100.1.3" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:uri"><saml2:AttributeValue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="xs:string">[email protected]</saml2:AttributeValue></saml2:Attribute></saml2:AttributeStatement></saml2:Assertion>');
var saml_passport = new SamlPassport({thumbprints: ['42FA24A83E107F6842E05D2A2CA0A0A0CA8A2031'],
realm: 'urn:auth0:fmi-test',
recipientUrl: 'https://fmi-test.auth0.com/login/callback',
checkExpiration: false}); // dont check expiration since we are harcoding the token
var profile = saml_passport.validateSamlAssertion(signedAssertion, function(err, profile) {
if (err) return done(err);
assert.ok(profile);
done();
});
});

C3 BC (UTF-8 Hex)
1100 0011 1011 1100 (UTF-8 Binary)
xxx0 0011 xx11 1100 (UTF-8 Binary w/o formatting bits)

00 FC (Unicode Code Point Hex)
0000 0000 1111 1100 (Unicode Code Point Binary)

Note that the Unicode Binary and UTF-8 w/o formatting bits generates the same representation 00FC after padding the value with leading zeroes.

From the XML 1.0 specification "4.1 Character and Entity References" https://www.w3.org/TR/xml/#sec-references:

If the character reference begins with &#x, the digits and letters up to the terminating ; provide a hexadecimal representation of the character's code point in ISO/IEC 10646. If it begins just with &#, the digits up to the terminating ; provide a decimal representation of the character's code point.

This would appear to indicate that the "code point" is the correct value representation and not the UTF-8 encoded value. This appears to be confirmed by testing using the Chrome browser and HTML.

<meta charset="UTF-8">
<html>
    <body>&#xC3;&#xBC;</body>
</html>

screen shot 2018-04-24 at 5 54 34 pm

<meta charset="UTF-8">
<html>
  <body>&#x00FC;</body>
</html>

screen shot 2018-04-24 at 5 55 02 pm

Based on the spec it seems then that the correct approach is to drop support for two-byte UTF-8 encoded character entity representations entirely.

@ghost ghost requested a review from ziluvatar April 25, 2018 00:58
@ghost ghost self-assigned this Apr 25, 2018
@ghost ghost added bug release labels Apr 25, 2018
@ghost
Copy link
Author

ghost commented Apr 25, 2018

Note that the following change is a potentially breaking change which is why the major release version has been updated.

Copy link
Contributor

@ziluvatar ziluvatar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is fine. I think we might need to integrate the change first and if we consider it stable enough we can merge and create the major version.
Summary: I'll leave the PR open for now.

Mike Lee and others added 3 commits June 20, 2018 15:53
…rsion includes an update to the xmldom dependency that will break an incorrect behavior where pairs of unicode XML character entities were combined into a single value.
@ghost
Copy link
Author

ghost commented Aug 20, 2018

@robinbijlani @machuga Do one of you know who owns this work now?

@robinbijlani
Copy link
Contributor

Hi @mikeops . Yes, Apollo will be handling continuing this work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants