Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we escape \ in non-special non-opaque paths? #675

Closed
TimothyGu opened this issue Nov 30, 2021 · 6 comments
Closed

Should we escape \ in non-special non-opaque paths? #675

TimothyGu opened this issue Nov 30, 2021 · 6 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser

Comments

@TimothyGu
Copy link
Member

Pros:

  • Matches RFC 3986 and Go
  • Might reduce path traversal vulnerabilities, for implementations that allow switching URLs from non-special to special schemes (i.e., all browsers)

Cons:

  • Doesn't match any browser (though to be fair, Chrome and Firefox don't actually support non-opaque paths in non-special URLs, and Safari implements the current spec)

Originally suggested by @karwa in #651 (comment).

@annevk
Copy link
Member

annevk commented Nov 30, 2021

I think a lot of these proposals would be easier to do once Chrome has proper non-special URL support. For better or worse they're in the best position to judge what changes can be made.

@alwinb
Copy link
Contributor

alwinb commented Jan 25, 2022

In any case. From the perspective of relative URLs, this is a good thing to do. The standard does not currently support parsing or serialising those, so decisions are often not checked against their implications for relative URLs either.

This means that it is left up to implementors of URL manipulation libraries to define a customised version of WHATWG URL normalisation that avoids reparse bugs for relative URLs and avoids as much ambiguity around scheme dependent parsing behaviour as possible.

Of course, the interpretation of \ being scheme dependent means that tools that produce relative, scheme-less URLs would better escape such occurrences, even if that does not agree with the normalisation as specified in the WHATWG standard. On the plus side, URLs with occurrences of \ are invalid, so I guess that isn’t so bad.

@annevk annevk added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Dec 16, 2022
@annevk
Copy link
Member

annevk commented Nov 29, 2024

Chrome Canary appears to have non-special URL support of some kind. Unclear to me if this is already shipping. But it does not escape \. No other browser does either. Example: https://jsdom.github.io/whatwg-url/#url=eDovL2FiYy5jb20vdGVzdFx0ZXN0&base=YWJvdXQ6Ymxhbms=

Closing based on that, but please leave a comment if this needs to be reconsidered.

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Nov 29, 2024
@alwinb
Copy link
Contributor

alwinb commented Dec 3, 2024

I would want to leave a note. It seems that this decision (whether or not to encode the \) is now based plainly on the new behaviour as seen in chrome canary, and the behaviour of the other browsers at present.

I think that it is a bit short sighted, or let’s say; not taking into account and/or weighing other aspects; not weighing the trade offs being made.

The \ is a “problem code point”. There’s special cases for it scattered throughout the parser, and the meaning of \ in the current standard is context dependent. Without scheme information, its meaning in paths is ambiguous. And changing the scheme of an URL that has \ in its pathname can cause the path to radically change shape and (due to normalising dotted segments) content.

It is a very unpleasant property that parsing e.g. http+git://example.com/path\to\src\.. and then changing the scheme to http using the API results in something completely different than directly parsing http://example.com/path\to\src\.. does. It’s a congruence bug, which is closely related to what you have been calling a reparse bug here.

Of course, for changing the scheme of e.g. an http URL to one that has traditionally been used with opaque paths such as mailto: and javascript: this is necessary in one form or another (if you really want to allow that..). But when changing the scheme across special and non-special-but-hierarchical URLs, it is unexpected and weird.

It makes it so that the semantics of URLs is non-compositional. You can no longer replace equals for equals, you can no longer use algebraic reasoning.

I am not sure that can be fully recovered, but preventing the parser/ normaliser from producing URLs with such a problem code point as much as possible seems very important to me.

Single and double quotes and angle brackets too are required to be percent encoded. IIRC this has not always been consistent across browsers either. They are used as delimiters in html and xml and json, JavaScript, other context that URLs may appear in, so it can cause issues to not escape them and it was changed without problems. It doesn’t make sense to me to NOT require the same for \ which is way more problematic than the quotation marks since it really messes with the internal structure of an URL unlike the others!

@annevk
Copy link
Member

annevk commented Dec 4, 2024

Well part of the tradeoff here that I didn't mention is compatibility. And that's the most important one.

@alwinb
Copy link
Contributor

alwinb commented Dec 4, 2024

You know what, scratch my previous comment.

Let’s take a step back.

I have never seen an hierarchical URL with a \ in its path that should not be interpreted as a delimiter.

Isn’t it way better to just treat any \ in any URL string as / unless the scheme indicates an opaque path? (ref, #385).

Better backward compatibility, much simpler parser, fewer special cases.

The current alignment is cool, but you will manage to get that again, don’t let it prevent us from making solid improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest topic: parser
Development

No branches or pull requests

3 participants