-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusion about pipe characters in paths #852
Comments
This is a bug in Chromium. You can use https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly90ZXN0L3w/fCN8&base=YWJvdXQ6Ymxhbms= to compare to the standard. We encourage people to not create such URLs as they might result in interoperability issues, but there is nothing preventing them from being created. |
But isn't the definition of |
No it's correct. I recommend reading https://url.spec.whatwg.org/#urls |
I still don't understand.
How does this mesh? You're saying not to percent-encode pipes on the URL encoding side, but at the same time, you're telling parsers not to accept unencoded pipes. Is this just a HTML-specific quirk? You're saying URLs normally shouldn't include pipes, but if a HTML document has an unencoded pipe in a link, we should assume the server URL parser is out-of-spec and accepts it, so we should send it as-is? That would make sense from a whatwg standpoint (since you are concerned with HTML / browsers specifically), but it apparently leads to insufficient escaping for non-HTML applications that use your URL encoding sets. |
The parser records an error, it's not terminated. |
Ahh I see. So invalidity does not necessarily imply you should behave differently. That is so weird. Thanks! |
Right, it's more a sign of "proceed at your own peril". (This is how conformance works for CSS and HTML as well. It's not entirely out there I'd say, but perhaps a bit novel if you're more protocol-inclined.) |
Yea it makes sense now, and we'll definitely have to handle it properly, it just never occurred to me that a validation error would not be fatal. That's what I get for only skipping between sections, not reading the standard in full. |
FWIW, there's a thread about how this is bad in my opinion, but I haven't managed to convince the editor (@annevk): #379. #379 (comment) is my attempt at a concise summary of the problem. |
What is the issue with the URL Standard?
I'm debugging some software incompatibility specifically around pipe characters (
|
, U+007c) in URL paths. Neither RFC 2396 nor RFC 3986 permit pipes in URIs according to the ABNF. The whatwg url spec also does not seem to permit it (pipes are not part ofURL units
). However, thepath percent-encode set
does not include the pipe character.A quick survey of what browsers do shows that:
encodeURI
andencodeURIComponent
do encode the pipe character, which makes sense because they use different percent-encode sets, not the path one.<a href="foo|barä">
sendsGET /foo|bar%C3%A4
, so it does usepath percent-encode set
. The httpwg HTTP spec seems to forbid this since it references RFC 3986, but firefox sends it anyway.GET /foo%7Cbar%C3%A4
, so it encodes both (but only displays the decodedä
in the url bar, interestingly)The whatwg url spec itself does not seem to be inconsistent. In the spec itself,
path percent-encode set
is only used in the URL decoding logic, nowhere does it actually say that path components should be encoded with this set, even if that is implied.So, what is actually the right behavior here? Should pipes in path segments be percent-encoded or not, and if so, why doesn't firefox do it? And should the
path percent-encode set
be adjusted to include|
?The text was updated successfully, but these errors were encountered: