Fetch: test nosniff parsing better #13559

annevk · 2018-10-17T09:58:31Z

No description provided.

annevk · 2018-10-17T10:00:18Z

@MattMenke2 I guess what you noticed in Chrome about empty headers (which I hadn't noticed since they are serialized) is indeed true. E.g., in this nosniff test Chrome ends up ignoring empty values altogether. Firefox only does it when they are separate header lines.

MattMenke2 · 2018-10-17T12:18:18Z

Except for certain headers, Chrome unconditionally splits header lines around commas into multiple header values, unconditionally allowing commas between both single and double quoted strings. This code also ignores empty values. A lot of parsers use the values output by this method, instead of the actual header lines we received, to avoid needing redundant code to split headers around commas.

It does mostly allow retrieving the original header lines, unmodified, though even there, believe it will remove the leading and trailing commas (Since it goes from the start of the first parsed value to the end of the last, but leading/trailing commas don't add a value).

… for MIME types anyway

annevk · 2018-10-17T12:34:55Z

That kind of generic handling makes sense to me, but it seems wrong to drop empty values as you argued elsewhere (though before discovering Chrome did so). It also seems wrong to have single quoted strings as they are not a thing in HTTP I think?

Irrespective of what Chrome does today, what do you think the future should look like here?

And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752. This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present. Tests: web-platform-tests/wpt#13559.

MattMenke2 · 2018-10-17T13:02:19Z

The problem with generic quote handling is that Content-Type, at least, specifies its own quote handling, which isn't compatible with it:

Content-Type: text/html; foo= ","; charset=GBK

If you split around commas with quotes and without quote handling, you get different results. Content-Type rules don't consider that a quoted string. Of course, it could be that Content-Type should be considered a non-coalescing header, which resolves the issue, but anything that only allows quoted strings in certain parts of the header will potentially have similar problems., if commas should also separate multiple values.

I'd really like to be able to have some unambiguous rules for how to handle these situations - merge all instances of a header, separated by commas, works, though it then means that we either need a general way to separate in values around commas, independent of header, or Chrome's approach doesn't work. Such rules don't currently exist, and I feel that specifying quote handling on a per-header basis isn't compatible with rules of that type, making everything essentially a non-coalescing header. If common code to split lines doesn't work, that means a lot more code to handle each individual header type, which seems more likely to result in bugs and inconsistent handling of headers. It has been looking to me like this is the direction we're going in (Which may be necessary, and I can live with, but does mean Chrome's current approach here is wrong).

I'd also like to have only a few parsing rules apply to all headers. For instance, one rule for everything of the form "Foo: a; b=c; d=e, x; y=z, s, t" - consistent rules for quote handling (Which would apply to everything from Accept-Language to Content-Type, etc).

annevk · 2018-10-17T13:15:37Z

Thanks! I agree with these goals. I was hoping to get to consistency by sorting out ideal parsing for a bunch of headers and then abstracting the common patterns as I go.

I do think we need to do combining and then parsing as otherwise there can be subtle differences if there's an intermediary that combines, which seems really bad for robustness.

Splitting on commas except for those quoted and then removing whitespace around the resulting tokens might be good enough, but it'll need a bunch of tests for various headers to show it's not obviously broken somehow.

To make progress here it would help if some of the refactoring PRs I created for Fetch and some of the tests against wpt, such as this one, can land. I could create a tracking bug against Chrome and other browsers to track this effort so we don't miss anything as we go, but also don't require anything to be patched before we are more convinced about the approach? Thoughts on that appreciated.

cc @ddragana @domenic @foolip @mnot

foolip · 2018-10-17T16:01:44Z

@annevk, if this is a large effort that will span many PRs, and keeping track of order/dependency is hard, then you might consider using a GitHub project. We currently have just one in the web-platform-tests org :)

foolip · 2018-10-17T16:02:20Z

@MattMenke2, can you review this PR?

MattMenke2

Tests look pretty thorough to me. One other test we could add is one with a null at the end of nosniff.

MattMenke2 · 2018-10-17T16:37:20Z

fetch/nosniff/parsing-nosniff.html

@@ -3,7 +3,7 @@
 <div id=log></div>
 <script>


Mind adding a comment, while you're here, on just what this test is doing? It looks to me like an accurate description would be something like:

Tests whether variations of a "X-Content-Type-Options: nosniff" header can be parsed as such. When the response header fails to be parsed as a valid nosniff header, the advertised Content-Type is not recognized as a possible script, and the script's onerror handler is invoked. When the nosniff header is not parsed as a valid nosniff header, the MIME type of the response is sniffed as text, and the onload handler is run.

MattMenke2 · 2018-10-17T16:45:41Z

fetch/nosniff/resources/nosniff-weird-tokens.asis

+HTTP/1.1 200 YOU HAVE NO POWER HERE
+Content-Length: 22
+Content-Type: x/x
+X-Content-Type-options: nosniff,,@#$#%%&^&^*()()11!


Could also do a weird tokens test with the "@#$#%%&^&^*()()11!" first, to make sure their headers with weird tokens aren't just ignored. Could additionally / alternatively do something similar for the tabulation test as well, adding another nosniff after the comma.

MattMenke2 · 2018-10-17T16:49:31Z

fetch/nosniff/resources/nosniff-line-tabulation.asis

+HTTP/1.1 200 YOU HAVE NO POWER HERE
+Content-Length: 22
+Content-Type: x/x
+X-Content-Type-options: nosniff,


Maybe call this vertical-tab instead of line-tabulation? At least I've always seen it caller vertical tab when looking at ascii tables and the like.

MattMenke2 · 2018-10-17T16:51:08Z

fetch/nosniff/resources/nosniff-first-empty-double.asis

+HTTP/1.1 200 YOU HAVE NO POWER HERE
+Content-Length: 22
+Content-Type: x/x
+X-Content-Type-options:


Is options deliberately lowercase in these tests? Shouldn't hurt, but as that's not what they're aimed at testing, should probably capitalize it.

MattMenke2 · 2018-10-17T16:52:47Z

fetch/nosniff/resources/nosniff-line-tabulation.asis

+HTTP/1.1 200 YOU HAVE NO POWER HERE
+Content-Length: 22
+Content-Type: x/x
+X-Content-Type-options: nosniff,


Do we get anything from the comma in this one?

annevk · 2018-10-18T12:36:43Z

Thanks, those are good cases to test. I should probably rewrite this and instead of using of using .asis, change this to use a Python script so there's less duplication and more test information in a single resource.

annevk · 2018-10-19T09:19:52Z

Thanks again for the review. I've now restructured the tests to be similar to the Content-Length tests and presumably all future header tests. I also think that this new setup negates the need for a comment, but let me know if you want one anyway.

I haven't tested 0x00 because of whatwg/xhr#165. Implementations haven't been updated (some tests did land), but there we reached the conclusion that such a byte in the header block should result in a network error.

MattMenke2 · 2018-10-19T15:25:38Z

fetch/nosniff/resources/x-content-type-options.json

+  	"nosniff": false
+  },
+  {
+    "input": "X-Content-Type-Options: nosniff\r\nX-Content-Type-Options: no",


nit: Should probably use consistent indentation for all of these (x3)

MattMenke2 · 2018-10-19T15:32:22Z

fetch/nosniff/parsing-nosniff.window.js

+    async_test(t => {
+      const script = document.createElement("script");
+      t.add_cleanup(() => script.remove());
+      if (testData.nosniff) {


I live completely in network land, so am far from knowledgeable about Javascript or the web platform, beyond HTTP (Admittedly, in Chrome, that also includes the MIME sniffing logic), but looking at the updated test, I still find it non-obvious why the nosniff results in errors. The sniffed as text -> treated as valid Javascipt connection isn't obvious to me.

That having been said, I do think that this makes it much clearer that the onerror cases correspond to nosniff being respected.

Maybe just something along the lines of "The resource must be sniffed as test to be successfully loaded as a script."

annevk · 2018-10-30T12:39:10Z

I fixed the formatting issues, flipped the expectation for 0x0C (see whatwg/fetch#818 (comment) though), and added a comment explaining how the script element is used here.

And add some of the infrastructure needed to define parsing better for all headers going forward (needed for #814). Fixes #752. This also fixes an issue with CORB as it simply assumed an X-Content-Type-Options was present. Tests: web-platform-tests/wpt#13559.

Fetch: test nosniff parsing better

a3241bd

wpt-pr-bot added the fetch label Oct 17, 2018

wpt-pr-bot requested review from domfarolino, jdm, mnot, youennf and yutakahirano October 17, 2018 09:58

test form feeds

7dd984b

meh, let's just require stripping form feed as well, we already do it…

ee3c287

… for MIME types anyway

annevk mentioned this pull request Oct 17, 2018

Define parsing for X-Content-Type-Options in detail whatwg/fetch#818

Merged

also test U+000B

18280c2

MattMenke2 reviewed Oct 17, 2018

View reviewed changes

use a reusable framework instead

fd3b2fa

MattMenke2 reviewed Oct 19, 2018

View reviewed changes

address feedback and stop treating 0x0C as whitespace

882ac0b

MattMenke2 approved these changes Oct 30, 2018

View reviewed changes

annevk merged commit 10f708e into master Nov 1, 2018

annevk deleted the annevk/nosniff-parsing branch November 1, 2018 09:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch: test nosniff parsing better #13559

Fetch: test nosniff parsing better #13559

annevk commented Oct 17, 2018

annevk commented Oct 17, 2018

MattMenke2 commented Oct 17, 2018

annevk commented Oct 17, 2018

MattMenke2 commented Oct 17, 2018

annevk commented Oct 17, 2018

foolip commented Oct 17, 2018

foolip commented Oct 17, 2018

MattMenke2 left a comment

MattMenke2 Oct 17, 2018

MattMenke2 Oct 17, 2018

MattMenke2 Oct 17, 2018

MattMenke2 Oct 17, 2018

MattMenke2 Oct 17, 2018

annevk commented Oct 18, 2018

annevk commented Oct 19, 2018

MattMenke2 Oct 19, 2018

MattMenke2 Oct 19, 2018

annevk commented Oct 30, 2018

Fetch: test nosniff parsing better #13559

Fetch: test nosniff parsing better #13559

Conversation

annevk commented Oct 17, 2018

annevk commented Oct 17, 2018

MattMenke2 commented Oct 17, 2018

annevk commented Oct 17, 2018

MattMenke2 commented Oct 17, 2018

annevk commented Oct 17, 2018

foolip commented Oct 17, 2018

foolip commented Oct 17, 2018

MattMenke2 left a comment

Choose a reason for hiding this comment

MattMenke2 Oct 17, 2018

Choose a reason for hiding this comment

MattMenke2 Oct 17, 2018

Choose a reason for hiding this comment

MattMenke2 Oct 17, 2018

Choose a reason for hiding this comment

MattMenke2 Oct 17, 2018

Choose a reason for hiding this comment

MattMenke2 Oct 17, 2018

Choose a reason for hiding this comment

annevk commented Oct 18, 2018

annevk commented Oct 19, 2018

MattMenke2 Oct 19, 2018

Choose a reason for hiding this comment

MattMenke2 Oct 19, 2018

Choose a reason for hiding this comment

annevk commented Oct 30, 2018