-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support i18n in URLs #5
Comments
Normalisation for domain names is hard. Links |
Started work on a new module to provide the functionality required: https://github.com/daurnimator/lua-unistring Though I don't know how I feel about adding a dependency for lpeg_patterns. |
Interesting discussion in https://tools.ietf.org/html/draft-ietf-iri-3987bis-13 (found via http://blog.jclark.com/2008/11/what-allowed-in-uri.html, thanks @jclark) about the 'ucschar' production |
More URL problems are also detailed in: https://tools.ietf.org/html/draft-ruby-url-problem-01 and I blogged about a few a while ago: https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/ There really is no good URL standard right now. |
libicu has TR46/UTS#46 support (transitional and non-transitional), but as you said (@daurnimator), your code has to work as plugin on systems without libicu. I just say this for the record that there is an 'easy' solution. libidn (=IDNA 2003) is obsolete and risky in use, libidn2 currently lacks UTS#46. |
I came up with this snippet that generates the IdnaMappingTable in pure lua: https://gist.github.com/daurnimator/be276c5d32329e2a9250f4aabeea48a8 The generated file is 880K. However loading it into memory seems to take up ~5.5M. Which makes me think it's not a good solution. |
@rockdaboot do I recall you saying libidn2 had some fixes and is now a good solution? |
Yes, libidn2 0.14 (in Debian unstable, maybe also already in testing) has TR46 support. When using idn2_lookup_*, add either IDN2_TRANSITIONAL or IDN2_NONTRANSITIONAL to the flags to get TR46 transitional or TR46 non-transitional behavior. Another good thing with TR46 is, you don't have to lowercase and/or NFC the input - this will be done by the TR46 processing (automatically). |
Today I packaged libidn2 for arch: https://aur.archlinux.org/packages/libidn2/ |
Would love support for both IDN-encoded domains
http://øl.no/
and encoded paths and query args, likehttp://google.com/?q=æøå
orhttp://google.com/å
Relevant RFC:
https://www.ietf.org/rfc/rfc3987.txt
The text was updated successfully, but these errors were encountered: