HTTP protocol #204

rvagg · 2023-04-26T04:34:43Z

This is branching from feat/http which is in #193; using this to make my specific revisions. Working toward #195.

Initial changes:

go-multicodec has a released version with this new entry
Dealing with the new Selector vs Path complexity which we discussed a few weeks ago (see HTTP Prep: Updating the Request Struct #180 for part of that) and is now here with us.
- We have to either gracefully handle the case where a user provides a custom selector, or remove the ability to provide anything other than a path+scope. After discussing with @kylehuntsman today, it seems like there may be some desire to keep selectors around, which is fine by me, and we can still decide later to ditch custom selectors. But for HTTP retrievals we just have to rule out the possibility of a custom selector with that retriever, so it will simply error.
- RetrievalRequest and CarScope themselves are used to figure out which selector (GetSelector()), what the path & querystring are for the HTTP request (GetUrlPath() - which can error), CarScope#TerminalSelectorSpec() to help with selector building and CarScope#AcceptHeader() to generate the appropriate Accept for that scope.

I'll continue to keep notes in here as I go.

codecov-commenter · 2023-04-26T04:35:04Z

Codecov Report

Merging #204 (c812ef7) into feat/http (c22d369) will increase coverage by 2.27%.
The diff coverage is 84.61%.

Additional details and impacted files

@@              Coverage Diff              @@
##           feat/http     #204      +/-   ##
=============================================
+ Coverage      68.16%   70.43%   +2.27%     
=============================================
  Files             63       64       +1     
  Lines           5443     5496      +53     
=============================================
+ Hits            3710     3871     +161     
+ Misses          1533     1409     -124     
- Partials         200      216      +16

Impacted Files	Coverage Δ
cmd/lassie/flags.go	`0.00% <ø> (ø)`
pkg/session/session.go	`57.14% <45.45%> (-1.95%)`	⬇️
pkg/retriever/directcandidatefinder.go	`64.80% <50.00%> (-1.31%)`	⬇️
pkg/internal/testutil/gen.go	`92.94% <62.50%> (-7.06%)`	⬇️
pkg/types/request.go	`55.95% <64.28%> (-5.96%)`	⬇️
pkg/types/types.go	`80.89% <70.58%> (-6.83%)`	⬇️
pkg/retriever/httpretriever.go	`73.33% <77.10%> (+73.33%)`	⬆️
pkg/retriever/graphsyncretriever.go	`87.57% <87.65%> (-4.03%)`	⬇️
...g/retriever/prioritywaitqueue/prioritywaitqueue.go	`92.39% <88.46%> (-1.90%)`	⬇️
pkg/retriever/parallelpeerretriever.go	`93.75% <93.75%> (ø)`
... and 5 more

... and 1 file with indirect coverage changes

willscott · 2023-04-26T14:29:26Z

having custom selectors error here is probably fine for the first pass
the second pass could do something similar to bifrost-gateway, and fall back to an un-optimized block-by-block bitswap-like request pattern in that case. less efficient than potentially optimal, but will always work.

rvagg · 2023-04-26T23:06:16Z

I'm pivoting here a little, refactoring the graphsync retriever so we can share the majority of the code between the two retrievers since the primary flow that we need is the same but the actual retrieving is different. In fact, it's almost close enough that you could just replace the "client" and they'd be the same, but not quite. But the whole mechanics of managing the parallel fetches, coordinating them, even selecting which one to next (as per #197) is very close and I'm finding myself copying and pasting a lot of code to bring this up to scratch already.

rvagg · 2023-04-27T12:22:33Z

Pushed a new update - my second (but very rough) try and trying to share logic between the graphsyncretriever and the http one. My initial attempt got too messy, trying to make an asbtract shared type that could be extended by two different retrievers. So my new tack was to just flesh out the graphsyncretriever more so it can serve both protocols, just with a different constructor for each. I'l try and disentangle it more (the switch statements are the main smell in here that I think will drive refactoring tomorrow).

The flow is basically the same between the two, with these differences:

Metadata comparison - we have no metadata for HTTP so we only have "connect time" to compare (but...)
What is "connect time" and when should we connect for HTTP? The current code starts the requests for all candidates in parallel but defers reading the bodies for the serial portion under the assumption that an upstream provider is going to have backpressure enabled that will stop them buffering up a huge response that we don't read promptly (or at all). That could be a bad assumption and maybe we should do request -> read as one operation and not have a separate "connect" time for providers.

rvagg · 2023-04-28T04:36:52Z

Refactored this to a form that I don't hate and I think deals with the concerns quite well.

parallelPeerRetriever has the guts of the old Graphsync retriever flow but none of the Graphsync logic
TransportProtocol is an interface to the actual protocol guts that a parallelPeerRetriever will use to execute the communication pieces. This is implemented by ProtocolGraphsync and ProtocolHttp where the specific protocol pieces live.

So now, parallelPeerRetriever handles the flow of receiving candidates (async), managing their metadata and deduping, getting them started (connect) in parallel, and then performing a retrieval in serial according to prioritisation.

The TransportProtocol implementations do some of the work of managing and comparing metadata to select the next-best candidate (this will be extended a bit for #92), doing a Connect() and then a Retrieve().

We still have the weirdness about what does a "Connect" mean for HTTP, but that question is separate to this abstraction and we need to answer that at some point. For now I'm still going with "start a request from every candidate as soon as we get them" and then "read the body from one candidate at a time until we get what we need".

rvagg · 2023-04-28T10:09:31Z

Aaaaand it lives. Working unit test added, so far just a simple single peer fetch, with a mocked http Client so it doesn't touch a server, just gets the bytes of a CAR and we expect to get the same blocks into our client LinkSystem in the same order. See httpretriever_test.go.

In theory, if you were to compile a lassie and the indexer were to serve you candidates that could do HTTP then it should work. Alternatively a --provider pointing to an endpoint that serves a CAR should do the same (probably just a Kubo node would do).

willscott

Very exciting!

pkg/retriever/graphsyncretriever.go

pkg/retriever/httpretriever.go

olizilla · 2023-05-03T12:05:41Z

I tested this branch against https://freeway.dag.haus and cid+path works as does --car-scope: file.

The only minor snag i hit was --providers expects a multiaddr with a p2p

~/Code/filecoin-project/lassie on rvagg/http  
❯ ./lassie fetch --providers /dns4/freeway.dag.haus/tcp/443/https bafybeihetumsy24mjzbts7t4vpft2rwput44joxxnzxhfh5woq6z46fe2y
2023-05-03T12:24:48.376+0100	FATAL	lassie	lassie/main.go:57	invalid p2p multiaddr

More here: storacha/freeway#34 (comment)

willscott · 2023-05-03T12:06:59Z

i wonder if you can use the same p2p identity as eipfs for now? i suspect the http transport client is validating that against anything at the moment

olizilla · 2023-05-03T12:11:08Z

If you want to run any tests against freeway, anything recently uploaded to web3.storage or via the new w3up api (cli here) should be available via https://freeway.dag.haus ... but it's dag root only. fetching via non-root block cid isn't supported via that gateway yet.

olizilla · 2023-05-03T12:12:15Z

@willscott that's what i did in storacha/freeway#34 (comment) (it works)

rvagg · 2023-05-08T09:49:15Z

Sorry, have tuned out notifications for a few days, catching up on some backlog.

The only minor snag i hit was --providers expects a multiaddr with a p2p

This is because we're heavily peer ID centric in the Lassie codebase. You can just stick a random peer ID on the end of your multiaddr for --providers and it'll work anyway.

Mainly we use that for internal disambiguation between parallel fetches (only relevant for --providers if you provide >1), metrics reporting (not enabled by default of course), and also will be used for stats collection for peer prioritisation in long-running instances (i.e. daemon). Not ideal UX though if you just want http and just want to give a simple form of the address so we should probably have a way to deal with this.

…otocol" iface

based on bitswap unit test framework

* too-detailed http testing, will remove this in favour of graphsync style testing.

rvagg · 2023-05-08T12:42:52Z

:old-man-shakes-fist-at-mock-timers: (I finally got flaky tests sorted out, after much struggle, using a horrible goroutine scheduling hack, but we shall speak of that no further for now.)

So the tests are complete-enough for now, there's not much more I can test for this basic functionality. All the most interesting stuff will come with #195.

So I'm going to merge this into feat/http for now; so this code can be better reviewed in #193 and I can start working on #195.

@olizilla I should have the basics of #195 in place in the next few days, it's when things might get more interesting because right now, using lassie on this branch is only really testing that you have an HTTP endpoint that responds to a certain path string with a valid CAR, and that's it. We don't care whether it's a v1 or v2, what it says its roots are, or even what, or how many, blocks are in it. #193 will bring in all the strictness we need according to ipfs/specs#402 and where discrepancies or deficiencies in implementations will become more obvious.

rvagg force-pushed the rvagg/http branch from da34d41 to b0f5944 Compare April 28, 2023 04:27

rvagg force-pushed the feat/http branch from 71063cd to c22d369 Compare April 28, 2023 04:30

rvagg force-pushed the rvagg/http branch from b0f5944 to ef3b7e2 Compare April 28, 2023 04:30

rvagg force-pushed the rvagg/http branch from 6dcfedb to 5a55cd0 Compare April 28, 2023 10:09

willscott reviewed Apr 28, 2023

View reviewed changes

pkg/retriever/graphsyncretriever.go Outdated Show resolved Hide resolved

pkg/retriever/graphsyncretriever.go Show resolved Hide resolved

pkg/retriever/httpretriever.go Outdated Show resolved Hide resolved

willscott mentioned this pull request May 3, 2023

Lassie compatible http api for fetching CARs storacha/freeway#34

Open

rvagg force-pushed the rvagg/http branch from 268dc5f to 1bdf4a1 Compare May 8, 2023 09:43

rvagg marked this pull request as ready for review May 8, 2023 09:44

rvagg force-pushed the rvagg/http branch 2 times, most recently from f6e8366 to a8cc6f8 Compare May 8, 2023 10:28

rvagg added 9 commits May 8, 2023 20:33

fix(http): gracefully handle selector vs path requests

e3a76e5

feat(http): extend graphsyncretriever so it can also do http

fc2a546

feat(http): refactor http & graphsync specific pieces to "TransportPr…

f95697d

…otocol" iface

feat(prioritywaitqueue): add InitialPauseDone inspector

65045b9

feat(http): single peer http retrieval unit test

c35c764

fix(http): enable http everywhere gs & bs are

149f94c

chore(http): framework for suite of http unit tests

065deff

based on bitswap unit test framework

feat(http): remove parallel-request flow, make all serial for now

234c7de

* too-detailed http testing, will remove this in favour of graphsync style testing.

fix(http): better testing framework

02153b8

rvagg force-pushed the feat/http branch from c22d369 to d98486e Compare May 8, 2023 10:34

fix(http): more test coverage, minor fixes

dd023f3

rvagg force-pushed the rvagg/http branch from a8cc6f8 to dd023f3 Compare May 8, 2023 10:43

fix(http): clean up time handling in tests

32839f6

rvagg force-pushed the rvagg/http branch from 6578c58 to 32839f6 Compare May 8, 2023 12:26

rvagg merged commit e72f174 into feat/http May 8, 2023

rvagg deleted the rvagg/http branch May 8, 2023 12:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTTP protocol #204

HTTP protocol #204

rvagg commented Apr 26, 2023 •

edited

Loading

codecov-commenter commented Apr 26, 2023 •

edited

Loading

willscott commented Apr 26, 2023

rvagg commented Apr 26, 2023

rvagg commented Apr 27, 2023

rvagg commented Apr 28, 2023

rvagg commented Apr 28, 2023

willscott left a comment

olizilla commented May 3, 2023 •

edited

Loading

willscott commented May 3, 2023

olizilla commented May 3, 2023

olizilla commented May 3, 2023 •

edited

Loading

rvagg commented May 8, 2023

rvagg commented May 8, 2023

HTTP protocol #204

HTTP protocol #204

Conversation

rvagg commented Apr 26, 2023 • edited Loading

codecov-commenter commented Apr 26, 2023 • edited Loading

Codecov Report

willscott commented Apr 26, 2023

rvagg commented Apr 26, 2023

rvagg commented Apr 27, 2023

rvagg commented Apr 28, 2023

rvagg commented Apr 28, 2023

willscott left a comment

Choose a reason for hiding this comment

olizilla commented May 3, 2023 • edited Loading

willscott commented May 3, 2023

olizilla commented May 3, 2023

olizilla commented May 3, 2023 • edited Loading

rvagg commented May 8, 2023

rvagg commented May 8, 2023

rvagg commented Apr 26, 2023 •

edited

Loading

codecov-commenter commented Apr 26, 2023 •

edited

Loading

olizilla commented May 3, 2023 •

edited

Loading

olizilla commented May 3, 2023 •

edited

Loading