Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing square brackets ([]) in path, query, and fragment #595

Closed
takenspc opened this issue Apr 28, 2021 · 2 comments · Fixed by #666
Closed

Parsing square brackets ([]) in path, query, and fragment #595

takenspc opened this issue Apr 28, 2021 · 2 comments · Fixed by #666
Labels
topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)

Comments

@takenspc
Copy link

It seems that URL parsers in the wild allow square brackets ([]) in path, query, and fragment. On the other hand, it seems that the URL spec says square brackets in path, query, and fragment will cause validation error.

My question is which one is correct:

  • url parsers are correct, the spec should be tweaked
  • the spec is correct, urls parses should be tweaked
  • both are correct (I'm wrong)

My opinion is url parsers are correct though I'm not too sure. Please let me know if I missed something.


URL parsers in the wild allow square brackets in path, query, and fragment:

new URL('https://example.com/[]?[]#[]'); // doesn't throw
// URL {
//   href: 'https://example.com/[]?[]#[]',
//   origin: 'https://example.com',
//   protocol: 'https:',
//   username: '',
//   password: '',
//   host: 'example.com',
//   hostname: 'example.com',
//   port: '',
//   pathname: '/[]',
//   search: '?[]',
//   searchParams: URLSearchParams { '[]' => '' },
//   hash: '#[]'
// }

I tested with Node.js 16 (stable), Firefox 90 (nightly) and Chrome 90 (stable).


The URL spec says square brackets in path, query, and fragment will cause validation error.

In basic URL parser's path state step 2., query state step 3. and fragment state step 1.:

  • If c is not a URL code point and not U+0025 (%), validation error.
  • If c is U+0025 (%) and remaining does not start with two ASCII hex digits, validation error.
  • UTF-8 percent-encode c using the path percent-encode set and append the result to buffer.

and URL code point doesn't contain square brackets (U+005B ([) and U+005D (]).

The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.

@annevk
Copy link
Member

annevk commented Apr 28, 2021

The output of the URL parser is not necessarily a URL record, that when serialized, is a valid URL string.

Perhaps we should add this example to https://url.spec.whatwg.org/#urls and state it there.

See also #379, which I guess this is a duplicate of.

@domenic domenic added the topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing) label Jul 1, 2021
annevk added a commit that referenced this issue Oct 21, 2021
annevk added a commit that referenced this issue Dec 9, 2022
annevk added a commit that referenced this issue Dec 9, 2022
@takenspc
Copy link
Author

takenspc commented Mar 2, 2023

@annevk I appreciate your clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: validation Pertaining to the rules for URL writing and validity (as opposed to parsing)
Development

Successfully merging a pull request may close this issue.

3 participants