Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode ' in username and password #608

Closed
TimothyGu opened this issue May 21, 2021 · 10 comments
Closed

Encode ' in username and password #608

TimothyGu opened this issue May 21, 2021 · 10 comments

Comments

@TimothyGu
Copy link
Member

u = new URL("http://us'er:pa'[email protected]/");
console.log(u.username);
console.log(u.password);

In Chrome and Firefox, the ' is escaped, in addition to Go. However, Safari, Node.js, and Ruby don't escape the '.

The case for ' is weaker than ^ in pathname (#607), but would allow us to align with 2/3 implementations still. In addition, we currently escape ; in userinfo, even though Go and Ruby don't escape it.

Making the change would be slightly tricky: adding ' to userinfo set would also add it to the component set; however, the component set is effectively frozen due to its alignment with encodeURIComponent.

Fun fact, Chrome's equivalent of the component set actually includes ' despite what its documentation claims. But registerProtocolHandler uses the "kQueryCharmap" defined in a separate part of the codebase, that happens to correspond exactly to encodeURIComponent.

@achristensen07
Copy link
Collaborator

I think I'm ok with aligning Safari with Chrome and Firefox.
Aside: Our tests really need to check each ASCII code point and a few other code points in each part of a URL to be complete.

@mnot
Copy link
Member

mnot commented May 22, 2021

3986 defines userinfo to include sub-delims, which includes '.

userinfo    = *( unreserved / pct-encoded / sub-delims / ":" )
unreserved  = ALPHA / DIGIT / "-" / "." / "_" / "~"
sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

So, it needs to be unescaped. Same for ;.

@TimothyGu
Copy link
Member Author

@mnot Why do you say that ' and ; need to be unescaped? RFC 3986 Sections 2.3 and 6.2.2.2 say that only unreserved characters need to be decoded:

For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers. (sec. 2.3)

[S]ome URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3. (sec. 6.2.2.2)

Nothing seems to forbid implementations from encoding sub-delims as they see appropriate, for e.g., XSS mitigations.

@alwinb
Copy link
Contributor

alwinb commented May 23, 2021

(Referencing #379 because it has a good overview)

@mnot
Copy link
Member

mnot commented May 24, 2021

Encoding raw characters that don't need to be encoded before they're consumed by applications is unexpected behaviour, without some sort of documentation or justification.

I could see mitigating XSS as a justification for serialising a URL with encoded ', but that wasn't discussed above, and we're talking about parsing behaviour -- the encoding is surfaced in the API offered to applications.

Given that WHATWG specs like URL are "low-level", I'd think that the focus would be on offering high-fidelity APIs for what's seen on the wire, rather than anticipating issues like XSS, when those only affect a subset of users (especially with .username and .password properties). That could be addressed by wrappers, or perhaps a separate function...

@TimothyGu
Copy link
Member Author

The WHATWG URL Standard internally stores components in a percent-encoded form, to make full URL serializing simpler (it's just string concatenation). That is already the status quo, and not a focus of this issue.

Though the original issue was phrased in terms of the JavaScript getters, it could equivalently be phrased as "should ' be percent-encoded in the serialized result (new URL(...).href)?"

@annevk annevk added needs implementer interest Moving the issue forward requires implementers to express interest topic: parser labels May 25, 2021
@annevk
Copy link
Member

annevk commented May 25, 2021

Is XSS really the argument here? Because ' is not percent-encoded in the path or fragment either.

@TimothyGu
Copy link
Member Author

@annevk maybe not in this particular case. Interop is perhaps the bigger case here.

Also, I would think that we already have implementer interest, considering Chrome and Firefox already implement this behavior?

@annevk annevk removed the needs implementer interest Moving the issue forward requires implementers to express interest label Jun 1, 2021
@annevk
Copy link
Member

annevk commented Jun 1, 2021

That's right.

I'm sympathetic to the point of view to not encode unless it's strictly needed, but I also don't care strongly what we end up doing for these particular components.

@annevk
Copy link
Member

annevk commented Dec 2, 2024

Firefox ended up aligning with Safari and the specification:

That makes me inclined to close this. cc @hayatoito

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants