Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about pipe characters in paths #852

Closed
yawkat opened this issue Jan 23, 2025 · 9 comments
Closed

Confusion about pipe characters in paths #852

yawkat opened this issue Jan 23, 2025 · 9 comments

Comments

@yawkat
Copy link

yawkat commented Jan 23, 2025

What is the issue with the URL Standard?

I'm debugging some software incompatibility specifically around pipe characters (|, U+007c) in URL paths. Neither RFC 2396 nor RFC 3986 permit pipes in URIs according to the ABNF. The whatwg url spec also does not seem to permit it (pipes are not part of URL units). However, the path percent-encode set does not include the pipe character.

A quick survey of what browsers do shows that:

  • encodeURI and encodeURIComponent do encode the pipe character, which makes sense because they use different percent-encode sets, not the path one.
  • In firefox, a link <a href="foo|barä"> sends GET /foo|bar%C3%A4, so it does use path percent-encode set. The httpwg HTTP spec seems to forbid this since it references RFC 3986, but firefox sends it anyway.
  • In chromium, the same link sends GET /foo%7Cbar%C3%A4, so it encodes both (but only displays the decoded ä in the url bar, interestingly)

The whatwg url spec itself does not seem to be inconsistent. In the spec itself, path percent-encode set is only used in the URL decoding logic, nowhere does it actually say that path components should be encoded with this set, even if that is implied.

So, what is actually the right behavior here? Should pipes in path segments be percent-encoded or not, and if so, why doesn't firefox do it? And should the path percent-encode set be adjusted to include |?

@annevk
Copy link
Member

annevk commented Jan 23, 2025

This is a bug in Chromium. You can use https://jsdom.github.io/whatwg-url/#url=aHR0cHM6Ly90ZXN0L3w/fCN8&base=YWJvdXQ6Ymxhbms= to compare to the standard.

We encourage people to not create such URLs as they might result in interoperability issues, but there is nothing preventing them from being created.

@yawkat
Copy link
Author

yawkat commented Jan 23, 2025

But isn't the definition of valid URL in "4.3 URL writing" incorrect, then? It does not list pipe as an allowed character in URL units.

@annevk
Copy link
Member

annevk commented Jan 23, 2025

No it's correct. I recommend reading https://url.spec.whatwg.org/#urls

@yawkat
Copy link
Author

yawkat commented Jan 23, 2025

I still don't understand.

  • valid URL string does not permit pipes.
  • Browsers send pipes as URLs anyway (or are supposed to), when the HTML says so.
  • when using the basic URL parser, the URL will produce an invalid-URL-unit error, because pipe is not in URL units.

How does this mesh? You're saying not to percent-encode pipes on the URL encoding side, but at the same time, you're telling parsers not to accept unencoded pipes.

Is this just a HTML-specific quirk? You're saying URLs normally shouldn't include pipes, but if a HTML document has an unencoded pipe in a link, we should assume the server URL parser is out-of-spec and accepts it, so we should send it as-is? That would make sense from a whatwg standpoint (since you are concerned with HTML / browsers specifically), but it apparently leads to insufficient escaping for non-HTML applications that use your URL encoding sets.

@annevk
Copy link
Member

annevk commented Jan 23, 2025

The parser records an error, it's not terminated.

@yawkat
Copy link
Author

yawkat commented Jan 23, 2025

Ahh I see. So invalidity does not necessarily imply you should behave differently. That is so weird. Thanks!

@yawkat yawkat closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2025
@annevk
Copy link
Member

annevk commented Jan 23, 2025

Right, it's more a sign of "proceed at your own peril". (This is how conformance works for CSS and HTML as well. It's not entirely out there I'd say, but perhaps a bit novel if you're more protocol-inclined.)

@yawkat
Copy link
Author

yawkat commented Jan 23, 2025

Yea it makes sense now, and we'll definitely have to handle it properly, it just never occurred to me that a validation error would not be fatal. That's what I get for only skipping between sections, not reading the standard in full.

@domenic
Copy link
Member

domenic commented Jan 24, 2025

  • Browsers send pipes as URLs anyway (or are supposed to), when the HTML says so.

FWIW, there's a thread about how this is bad in my opinion, but I haven't managed to convince the editor (@annevk): #379. #379 (comment) is my attempt at a concise summary of the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants