-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML character references are not unescaped/escaped #17
Comments
Something like the following may help (for unescaping hexadecimal numeric character references): function unescape_unicode(s::AbstractString)
i = firstindex(s)
while (m = match(r"&#(x)(\w{2,4});", s, i)) !== nothing
s = replace(s, m.match => unescape_string("\\u$(m.captures[2])"))
i = m.offset + 1
end
return s
end |
Hmm, these entities need to be defined in the DTD, correct? I think we'd need |
Ah - yes - that's right - my ancient memory of XML, and in particular HTML, led me to believe that they were built-in also in XML, but I see now that XML only defines five entities - and all of the HTML-like entities are mostly/solely defined for HTML: https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Standard_public_entity_sets_for_characters Perhaps one could just have some html-convenience escape methods... |
Or perhaps provide something convenient for getting common DTDs like http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd |
XML character entity references, e.g.
Å
("Å"), and XML numeric character references, e.g.Å
("Å"), are not unescaped/escaped byXML.unescape
andXML.escape
methods.The text was updated successfully, but these errors were encountered: