Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch: test nosniff parsing better #13559

Merged
merged 6 commits into from
Nov 1, 2018
Merged

Fetch: test nosniff parsing better #13559

merged 6 commits into from
Nov 1, 2018

Conversation

annevk
Copy link
Member

@annevk annevk commented Oct 17, 2018

No description provided.

@annevk
Copy link
Member Author

annevk commented Oct 17, 2018

@MattMenke2 I guess what you noticed in Chrome about empty headers (which I hadn't noticed since they are serialized) is indeed true. E.g., in this nosniff test Chrome ends up ignoring empty values altogether. Firefox only does it when they are separate header lines.

@MattMenke2
Copy link
Member

Except for certain headers, Chrome unconditionally splits header lines around commas into multiple header values, unconditionally allowing commas between both single and double quoted strings. This code also ignores empty values. A lot of parsers use the values output by this method, instead of the actual header lines we received, to avoid needing redundant code to split headers around commas.

It does mostly allow retrieving the original header lines, unmodified, though even there, believe it will remove the leading and trailing commas (Since it goes from the start of the first parsed value to the end of the last, but leading/trailing commas don't add a value).

@annevk
Copy link
Member Author

annevk commented Oct 17, 2018

That kind of generic handling makes sense to me, but it seems wrong to drop empty values as you argued elsewhere (though before discovering Chrome did so). It also seems wrong to have single quoted strings as they are not a thing in HTTP I think?

Irrespective of what Chrome does today, what do you think the future should look like here?

annevk added a commit to whatwg/fetch that referenced this pull request Oct 17, 2018
And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752.

This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present.

Tests: web-platform-tests/wpt#13559.
@MattMenke2
Copy link
Member

The problem with generic quote handling is that Content-Type, at least, specifies its own quote handling, which isn't compatible with it:

Content-Type: text/html; foo= ","; charset=GBK

If you split around commas with quotes and without quote handling, you get different results. Content-Type rules don't consider that a quoted string. Of course, it could be that Content-Type should be considered a non-coalescing header, which resolves the issue, but anything that only allows quoted strings in certain parts of the header will potentially have similar problems., if commas should also separate multiple values.

I'd really like to be able to have some unambiguous rules for how to handle these situations - merge all instances of a header, separated by commas, works, though it then means that we either need a general way to separate in values around commas, independent of header, or Chrome's approach doesn't work. Such rules don't currently exist, and I feel that specifying quote handling on a per-header basis isn't compatible with rules of that type, making everything essentially a non-coalescing header. If common code to split lines doesn't work, that means a lot more code to handle each individual header type, which seems more likely to result in bugs and inconsistent handling of headers. It has been looking to me like this is the direction we're going in (Which may be necessary, and I can live with, but does mean Chrome's current approach here is wrong).

I'd also like to have only a few parsing rules apply to all headers. For instance, one rule for everything of the form "Foo: a; b=c; d=e, x; y=z, s, t" - consistent rules for quote handling (Which would apply to everything from Accept-Language to Content-Type, etc).

@annevk
Copy link
Member Author

annevk commented Oct 17, 2018

Thanks! I agree with these goals. I was hoping to get to consistency by sorting out ideal parsing for a bunch of headers and then abstracting the common patterns as I go.

I do think we need to do combining and then parsing as otherwise there can be subtle differences if there's an intermediary that combines, which seems really bad for robustness.

Splitting on commas except for those quoted and then removing whitespace around the resulting tokens might be good enough, but it'll need a bunch of tests for various headers to show it's not obviously broken somehow.

To make progress here it would help if some of the refactoring PRs I created for Fetch and some of the tests against wpt, such as this one, can land. I could create a tracking bug against Chrome and other browsers to track this effort so we don't miss anything as we go, but also don't require anything to be patched before we are more convinced about the approach? Thoughts on that appreciated.

cc @ddragana @domenic @foolip @mnot

@foolip
Copy link
Member

foolip commented Oct 17, 2018

@annevk, if this is a large effort that will span many PRs, and keeping track of order/dependency is hard, then you might consider using a GitHub project. We currently have just one in the web-platform-tests org :)

@foolip
Copy link
Member

foolip commented Oct 17, 2018

@MattMenke2, can you review this PR?

Copy link
Member

@MattMenke2 MattMenke2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests look pretty thorough to me. One other test we could add is one with a null at the end of nosniff.

@@ -3,7 +3,7 @@
<div id=log></div>
<script>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding a comment, while you're here, on just what this test is doing? It looks to me like an accurate description would be something like:

Tests whether variations of a "X-Content-Type-Options: nosniff" header can be parsed as such. When the response header fails to be parsed as a valid nosniff header, the advertised Content-Type is not recognized as a possible script, and the script's onerror handler is invoked. When the nosniff header is not parsed as a valid nosniff header, the MIME type of the response is sniffed as text, and the onload handler is run.

HTTP/1.1 200 YOU HAVE NO POWER HERE
Content-Length: 22
Content-Type: x/x
X-Content-Type-options: nosniff,,@#$#%%&^&^*()()11!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also do a weird tokens test with the "@#$#%%&^&^*()()11!" first, to make sure their headers with weird tokens aren't just ignored. Could additionally / alternatively do something similar for the tabulation test as well, adding another nosniff after the comma.

HTTP/1.1 200 YOU HAVE NO POWER HERE
Content-Length: 22
Content-Type: x/x
X-Content-Type-options: nosniff ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call this vertical-tab instead of line-tabulation? At least I've always seen it caller vertical tab when looking at ascii tables and the like.

HTTP/1.1 200 YOU HAVE NO POWER HERE
Content-Length: 22
Content-Type: x/x
X-Content-Type-options:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is options deliberately lowercase in these tests? Shouldn't hurt, but as that's not what they're aimed at testing, should probably capitalize it.

HTTP/1.1 200 YOU HAVE NO POWER HERE
Content-Length: 22
Content-Type: x/x
X-Content-Type-options: nosniff ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we get anything from the comma in this one?

@annevk
Copy link
Member Author

annevk commented Oct 18, 2018

Thanks, those are good cases to test. I should probably rewrite this and instead of using of using .asis, change this to use a Python script so there's less duplication and more test information in a single resource.

@annevk
Copy link
Member Author

annevk commented Oct 19, 2018

Thanks again for the review. I've now restructured the tests to be similar to the Content-Length tests and presumably all future header tests. I also think that this new setup negates the need for a comment, but let me know if you want one anyway.

I haven't tested 0x00 because of whatwg/xhr#165. Implementations haven't been updated (some tests did land), but there we reached the conclusion that such a byte in the header block should result in a network error.

"nosniff": false
},
{
"input": "X-Content-Type-Options: nosniff\r\nX-Content-Type-Options: no",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should probably use consistent indentation for all of these (x3)

async_test(t => {
const script = document.createElement("script");
t.add_cleanup(() => script.remove());
if (testData.nosniff) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I live completely in network land, so am far from knowledgeable about Javascript or the web platform, beyond HTTP (Admittedly, in Chrome, that also includes the MIME sniffing logic), but looking at the updated test, I still find it non-obvious why the nosniff results in errors. The sniffed as text -> treated as valid Javascipt connection isn't obvious to me.

That having been said, I do think that this makes it much clearer that the onerror cases correspond to nosniff being respected.

Maybe just something along the lines of "The resource must be sniffed as test to be successfully loaded as a script."

@annevk
Copy link
Member Author

annevk commented Oct 30, 2018

I fixed the formatting issues, flipped the expectation for 0x0C (see whatwg/fetch#818 (comment) though), and added a comment explaining how the script element is used here.

@annevk annevk merged commit 10f708e into master Nov 1, 2018
@annevk annevk deleted the annevk/nosniff-parsing branch November 1, 2018 09:12
annevk added a commit to whatwg/fetch that referenced this pull request Nov 1, 2018
And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752.

This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present.

Tests: web-platform-tests/wpt#13559.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants