-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch: test nosniff parsing better #13559
Conversation
@MattMenke2 I guess what you noticed in Chrome about empty headers (which I hadn't noticed since they are serialized) is indeed true. E.g., in this nosniff test Chrome ends up ignoring empty values altogether. Firefox only does it when they are separate header lines. |
Except for certain headers, Chrome unconditionally splits header lines around commas into multiple header values, unconditionally allowing commas between both single and double quoted strings. This code also ignores empty values. A lot of parsers use the values output by this method, instead of the actual header lines we received, to avoid needing redundant code to split headers around commas. It does mostly allow retrieving the original header lines, unmodified, though even there, believe it will remove the leading and trailing commas (Since it goes from the start of the first parsed value to the end of the last, but leading/trailing commas don't add a value). |
… for MIME types anyway
That kind of generic handling makes sense to me, but it seems wrong to drop empty values as you argued elsewhere (though before discovering Chrome did so). It also seems wrong to have single quoted strings as they are not a thing in HTTP I think? Irrespective of what Chrome does today, what do you think the future should look like here? |
And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752. This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present. Tests: web-platform-tests/wpt#13559.
The problem with generic quote handling is that Content-Type, at least, specifies its own quote handling, which isn't compatible with it: Content-Type: text/html; foo= ","; charset=GBK If you split around commas with quotes and without quote handling, you get different results. Content-Type rules don't consider that a quoted string. Of course, it could be that Content-Type should be considered a non-coalescing header, which resolves the issue, but anything that only allows quoted strings in certain parts of the header will potentially have similar problems., if commas should also separate multiple values. I'd really like to be able to have some unambiguous rules for how to handle these situations - merge all instances of a header, separated by commas, works, though it then means that we either need a general way to separate in values around commas, independent of header, or Chrome's approach doesn't work. Such rules don't currently exist, and I feel that specifying quote handling on a per-header basis isn't compatible with rules of that type, making everything essentially a non-coalescing header. If common code to split lines doesn't work, that means a lot more code to handle each individual header type, which seems more likely to result in bugs and inconsistent handling of headers. It has been looking to me like this is the direction we're going in (Which may be necessary, and I can live with, but does mean Chrome's current approach here is wrong). I'd also like to have only a few parsing rules apply to all headers. For instance, one rule for everything of the form "Foo: a; b=c; d=e, x; y=z, s, t" - consistent rules for quote handling (Which would apply to everything from Accept-Language to Content-Type, etc). |
Thanks! I agree with these goals. I was hoping to get to consistency by sorting out ideal parsing for a bunch of headers and then abstracting the common patterns as I go. I do think we need to do combining and then parsing as otherwise there can be subtle differences if there's an intermediary that combines, which seems really bad for robustness. Splitting on commas except for those quoted and then removing whitespace around the resulting tokens might be good enough, but it'll need a bunch of tests for various headers to show it's not obviously broken somehow. To make progress here it would help if some of the refactoring PRs I created for Fetch and some of the tests against wpt, such as this one, can land. I could create a tracking bug against Chrome and other browsers to track this effort so we don't miss anything as we go, but also don't require anything to be patched before we are more convinced about the approach? Thoughts on that appreciated. |
@annevk, if this is a large effort that will span many PRs, and keeping track of order/dependency is hard, then you might consider using a GitHub project. We currently have just one in the web-platform-tests org :) |
@MattMenke2, can you review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests look pretty thorough to me. One other test we could add is one with a null at the end of nosniff.
fetch/nosniff/parsing-nosniff.html
Outdated
@@ -3,7 +3,7 @@ | |||
<div id=log></div> | |||
<script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding a comment, while you're here, on just what this test is doing? It looks to me like an accurate description would be something like:
Tests whether variations of a "X-Content-Type-Options: nosniff" header can be parsed as such. When the response header fails to be parsed as a valid nosniff header, the advertised Content-Type is not recognized as a possible script, and the script's onerror handler is invoked. When the nosniff header is not parsed as a valid nosniff header, the MIME type of the response is sniffed as text, and the onload handler is run.
HTTP/1.1 200 YOU HAVE NO POWER HERE | ||
Content-Length: 22 | ||
Content-Type: x/x | ||
X-Content-Type-options: nosniff,,@#$#%%&^&^*()()11! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also do a weird tokens test with the "@#$#%%&^&^*()()11!" first, to make sure their headers with weird tokens aren't just ignored. Could additionally / alternatively do something similar for the tabulation test as well, adding another nosniff after the comma.
HTTP/1.1 200 YOU HAVE NO POWER HERE | ||
Content-Length: 22 | ||
Content-Type: x/x | ||
X-Content-Type-options: nosniff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call this vertical-tab instead of line-tabulation? At least I've always seen it caller vertical tab when looking at ascii tables and the like.
HTTP/1.1 200 YOU HAVE NO POWER HERE | ||
Content-Length: 22 | ||
Content-Type: x/x | ||
X-Content-Type-options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is options deliberately lowercase in these tests? Shouldn't hurt, but as that's not what they're aimed at testing, should probably capitalize it.
HTTP/1.1 200 YOU HAVE NO POWER HERE | ||
Content-Length: 22 | ||
Content-Type: x/x | ||
X-Content-Type-options: nosniff, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we get anything from the comma in this one?
Thanks, those are good cases to test. I should probably rewrite this and instead of using of using |
Thanks again for the review. I've now restructured the tests to be similar to the Content-Length tests and presumably all future header tests. I also think that this new setup negates the need for a comment, but let me know if you want one anyway. I haven't tested 0x00 because of whatwg/xhr#165. Implementations haven't been updated (some tests did land), but there we reached the conclusion that such a byte in the header block should result in a network error. |
"nosniff": false | ||
}, | ||
{ | ||
"input": "X-Content-Type-Options: nosniff\r\nX-Content-Type-Options: no", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should probably use consistent indentation for all of these (x3)
async_test(t => { | ||
const script = document.createElement("script"); | ||
t.add_cleanup(() => script.remove()); | ||
if (testData.nosniff) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I live completely in network land, so am far from knowledgeable about Javascript or the web platform, beyond HTTP (Admittedly, in Chrome, that also includes the MIME sniffing logic), but looking at the updated test, I still find it non-obvious why the nosniff results in errors. The sniffed as text -> treated as valid Javascipt connection isn't obvious to me.
That having been said, I do think that this makes it much clearer that the onerror cases correspond to nosniff being respected.
Maybe just something along the lines of "The resource must be sniffed as test to be successfully loaded as a script."
I fixed the formatting issues, flipped the expectation for 0x0C (see whatwg/fetch#818 (comment) though), and added a comment explaining how the |
And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752. This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present. Tests: web-platform-tests/wpt#13559.
No description provided.