Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental download code #1627

Open
wants to merge 12 commits into
base: download-experiment
Choose a base branch
from

Conversation

ogenev
Copy link
Member

@ogenev ogenev commented Jan 9, 2025

Merging my experimental download code to a new branch.

The code loads the CSV file with block hashes from block 14,000,000 till The Merge (15,537,393) and starts to request from the network block bodies and receipts in a configurable number of batches. When all data from the current batch is downloaded. it proceeds to the next batch of blocks.

TO switch between find content with census and peer scoring and recursive find content queries without census, set the CENSUS const (true for census, false - no census).
A hacky OfferReport is used for peer scoring as a shortcut but can also represent a FIndContent failure.).

The CSV data set can be downloaded from here and needs to be placed in the trin-history/src folder.

To run the downloader, just start Trin with the history network enabled.

Notes:

  • running the script without a full view of the network and peer scoring is faster. (This may be due to a difference in the recursive find content and native find content implementations in Trin or some inefficiencies in the downloader code. It needs more investigation.)

Here are some metrics,

Without Census:
Screenshot 2025-01-16 at 12 02 09

With Census (full view of the network and peer scoring):
Screenshot 2025-01-16 at 12 03 54

  • a reasonable BATCH_SIZE is 20-30 blocks per batch (40-60 content types), bumping this number higher than 40-50 blocks will cause Trin's uTP to be stale.

@ogenev ogenev changed the title Download experimental code Experimental download code Jan 9, 2025
@ogenev ogenev force-pushed the findcontent-experiment branch from fd8def6 to 194e26c Compare January 9, 2025 12:03
@ogenev
Copy link
Member Author

ogenev commented Jan 16, 2025

This is ready for a quick look to make sure that there is no bugs or inefficiencies in the downloader code and method.

@morph-dev
Copy link
Collaborator

If I'm not mistaken, recursive_find_content has parallelism (I think with factor 3), while find_content with census doesn't.
I think that would explain why it's faster by small margin.

One thing from graphs that is unclear to me, that maybe you have explanation for:
Why is "Total send FindContent requests per second" so much higher than "Downloaded blocks per second"?

I can sort of get it for the first approach (there might be quite a few extra queries until we find a peer).
But it doesn't make sense for census. Aren't we contacting at most 8 peers per block (4 for body + 4 for receipt)?!


There are few things that we can try (some of them might not be trivial) to speed up the whole thing, but I don't think we can get 10x improvements. Some ideas:

  • I think we can do better than fetching in batches, and avoid one long/heavy block delaying many others
    • in my opinion, this has a potential to give us the biggest wins
    • it can be applied on both approaches
  • Census can try to fetch the same content id from 2/3 peers at the time
    • this should close the gap between two approaches (or maybe even make census faster)
  • We can measure peer's throughput on successful responses and apply it on peer scoring (needs changes to peer scoring algorithm, but it shouldn't be too difficult)
    • this might lead to unexpected wins
  • If peers respond with ENRs, we shouldn't score it as failure (just because it didn't have one content, doesn't mean it won't have others)

Happy to chat and brainstorm some ideas in more details.

@morph-dev
Copy link
Collaborator

Also, what is regular trin client doing in the background? If it is new client with empty db, it might get spammed with offers all the time, which would definitely affect performance.

@ogenev
Copy link
Member Author

ogenev commented Jan 17, 2025

One thing from graphs that is unclear to me, that maybe you have explanation for: Why is "Total send FindContent requests per second" so much higher than "Downloaded blocks per second"?

I can sort of get it for the first approach (there might be quite a few extra queries until we find a peer). But it doesn't make sense for census. Aren't we contacting at most 8 peers per block (4 for body + 4 for receipt)?!

Good catch. This didn't make sense to me, so I started investigating and found a bug in the overlay service, which is sending 4x more FindContent requests than expected. I am looking into the exact reason and how to fix it.

@ogenev ogenev force-pushed the findcontent-experiment branch from 941d652 to 695f14b Compare January 20, 2025 14:34
@ogenev
Copy link
Member Author

ogenev commented Jan 23, 2025

Ok, so the multiple FindContent requests thing is not a bug, it is because of the validation of bodies and receipt.

To validate those contents, we send recursive find content to get the header from the network. So it is one request for the body and 3 requests for the header - because the query parallelism is 3.

Anyways, I disabled all validation and ran again the test. Without census still performs slightly better than census.

Here are the results:

Without Census
Screenshot 2025-01-23 at 13 34 41

With Census
image

@pipermerriam
Copy link
Member

What additional questions do you still have to answer on this task?

@ogenev
Copy link
Member Author

ogenev commented Jan 23, 2025

What additional questions do you still have to answer on this task?

It is mostly done for now. Even if we do some of the optimizations Milos proposed, I'm not expecting huge performance boost.

It is good to have some numbers, and 4-5 blocks per second is not comparable to what geth can accomplish over devp2p. However, I'm still optimistic about the range queries and the improvement they can bring.

I think the next step is implementing range queries and run those test again 😄. What do you think?

@ogenev ogenev force-pushed the findcontent-experiment branch from 5d9629f to bc0d4e9 Compare January 29, 2025 11:00
@ogenev ogenev changed the base branch from download-experiment to master January 29, 2025 11:05
@ogenev ogenev changed the base branch from master to download-experiment January 29, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants