-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental download code #1627
base: download-experiment
Are you sure you want to change the base?
Experimental download code #1627
Conversation
fd8def6
to
194e26c
Compare
This is ready for a quick look to make sure that there is no bugs or inefficiencies in the downloader code and method. |
If I'm not mistaken, One thing from graphs that is unclear to me, that maybe you have explanation for: I can sort of get it for the first approach (there might be quite a few extra queries until we find a peer). There are few things that we can try (some of them might not be trivial) to speed up the whole thing, but I don't think we can get 10x improvements. Some ideas:
Happy to chat and brainstorm some ideas in more details. |
Also, what is regular trin client doing in the background? If it is new client with empty db, it might get spammed with offers all the time, which would definitely affect performance. |
Good catch. This didn't make sense to me, so I started investigating and found a bug in the overlay service, which is sending 4x more FindContent requests than expected. I am looking into the exact reason and how to fix it. |
941d652
to
695f14b
Compare
Ok, so the multiple FindContent requests thing is not a bug, it is because of the validation of bodies and receipt. To validate those contents, we send recursive find content to get the header from the network. So it is one request for the body and 3 requests for the header - because the query parallelism is 3. Anyways, I disabled all validation and ran again the test. Without census still performs slightly better than census. Here are the results: |
What additional questions do you still have to answer on this task? |
It is mostly done for now. Even if we do some of the optimizations Milos proposed, I'm not expecting huge performance boost. It is good to have some numbers, and 4-5 blocks per second is not comparable to what geth can accomplish over devp2p. However, I'm still optimistic about the range queries and the improvement they can bring. I think the next step is implementing range queries and run those test again 😄. What do you think? |
5d9629f
to
bc0d4e9
Compare
Merging my experimental download code to a new branch.
The code loads the CSV file with block hashes from block
14,000,000
till The Merge (15,537,393
) and starts to request from the network block bodies and receipts in a configurable number of batches. When all data from the current batch is downloaded. it proceeds to the next batch of blocks.TO switch between find content with census and peer scoring and recursive find content queries without census, set the CENSUS const (true for census, false - no census).
A hacky
OfferReport
is used for peer scoring as a shortcut but can also represent a FIndContent failure.).The CSV data set can be downloaded from here and needs to be placed in the
trin-history/src
folder.To run the downloader, just start Trin with the history network enabled.
Notes:
Here are some metrics,
Without Census:
With Census (full view of the network and peer scoring):
BATCH_SIZE
is 20-30 blocks per batch (40-60 content types), bumping this number higher than 40-50 blocks will cause Trin's uTP to be stale.