Experimental download code #1627

ogenev · 2025-01-09T11:56:24Z

Merging my experimental download code to a new branch.

The code loads the CSV file with block hashes from block 14,000,000 till The Merge (15,537,393) and starts to request from the network block bodies and receipts in a configurable number of batches. When all data from the current batch is downloaded. it proceeds to the next batch of blocks.

TO switch between find content with census and peer scoring and recursive find content queries without census, set the CENSUS const (true for census, false - no census).
A hacky OfferReport is used for peer scoring as a shortcut but can also represent a FIndContent failure.).

The CSV data set can be downloaded from here and needs to be placed in the trin-history/src folder.

To run the downloader, just start Trin with the history network enabled.

Notes:

running the script without a full view of the network and peer scoring is faster. (This may be due to a difference in the recursive find content and native find content implementations in Trin or some inefficiencies in the downloader code. It needs more investigation.)

Here are some metrics,

Without Census:

With Census (full view of the network and peer scoring):

a reasonable BATCH_SIZE is 20-30 blocks per batch (40-60 content types), bumping this number higher than 40-50 blocks will cause Trin's uTP to be stale.

ogenev · 2025-01-16T10:26:28Z

This is ready for a quick look to make sure that there is no bugs or inefficiencies in the downloader code and method.

morph-dev · 2025-01-16T22:29:09Z

If I'm not mistaken, recursive_find_content has parallelism (I think with factor 3), while find_content with census doesn't.
I think that would explain why it's faster by small margin.

One thing from graphs that is unclear to me, that maybe you have explanation for:
Why is "Total send FindContent requests per second" so much higher than "Downloaded blocks per second"?

I can sort of get it for the first approach (there might be quite a few extra queries until we find a peer).
But it doesn't make sense for census. Aren't we contacting at most 8 peers per block (4 for body + 4 for receipt)?!

There are few things that we can try (some of them might not be trivial) to speed up the whole thing, but I don't think we can get 10x improvements. Some ideas:

I think we can do better than fetching in batches, and avoid one long/heavy block delaying many others
- in my opinion, this has a potential to give us the biggest wins
- it can be applied on both approaches
Census can try to fetch the same content id from 2/3 peers at the time
- this should close the gap between two approaches (or maybe even make census faster)
We can measure peer's throughput on successful responses and apply it on peer scoring (needs changes to peer scoring algorithm, but it shouldn't be too difficult)
- this might lead to unexpected wins
If peers respond with ENRs, we shouldn't score it as failure (just because it didn't have one content, doesn't mean it won't have others)

Happy to chat and brainstorm some ideas in more details.

morph-dev · 2025-01-16T22:31:40Z

Also, what is regular trin client doing in the background? If it is new client with empty db, it might get spammed with offers all the time, which would definitely affect performance.

ogenev · 2025-01-17T10:40:34Z

One thing from graphs that is unclear to me, that maybe you have explanation for: Why is "Total send FindContent requests per second" so much higher than "Downloaded blocks per second"?

I can sort of get it for the first approach (there might be quite a few extra queries until we find a peer). But it doesn't make sense for census. Aren't we contacting at most 8 peers per block (4 for body + 4 for receipt)?!

Good catch. This didn't make sense to me, so I started investigating and found a bug in the overlay service, which is sending 4x more FindContent requests than expected. I am looking into the exact reason and how to fix it.

ogenev · 2025-01-23T12:07:07Z

Ok, so the multiple FindContent requests thing is not a bug, it is because of the validation of bodies and receipt.

To validate those contents, we send recursive find content to get the header from the network. So it is one request for the body and 3 requests for the header - because the query parallelism is 3.

Anyways, I disabled all validation and ran again the test. Without census still performs slightly better than census.

Here are the results:

Without Census

With Census

pipermerriam · 2025-01-23T14:27:39Z

What additional questions do you still have to answer on this task?

ogenev · 2025-01-23T15:33:30Z

What additional questions do you still have to answer on this task?

It is mostly done for now. Even if we do some of the optimizations Milos proposed, I'm not expecting huge performance boost.

It is good to have some numbers, and 4-5 blocks per second is not comparable to what geth can accomplish over devp2p. However, I'm still optimistic about the range queries and the improvement they can bring.

I think the next step is implementing range queries and run those test again 😄. What do you think?

…ent queries

ogenev changed the title ~~Download experimental code~~ Experimental download code Jan 9, 2025

ogenev force-pushed the findcontent-experiment branch from fd8def6 to 194e26c Compare January 9, 2025 12:03

ogenev requested review from morph-dev, KolbyML, njgheorghita, carver and mrferris January 16, 2025 10:26

ogenev force-pushed the findcontent-experiment branch from 941d652 to 695f14b Compare January 20, 2025 14:34

ogenev added 12 commits January 29, 2025 12:53

feat: add content downlaoder to qury data from the network

d0e3df9

feat: add metric for total bytes inbound received

578f33d

feat: add census with peer scoring to downloader

9b0e3e9

feat: enable peer scoring

372811e

chore: report number of failed outgoing requests

43f7e6f

chore: add downloader metrics

45d4963

feat: switch between find content with census and recursive find cont…

0958b91

…ent queries

chore: report find content query elapsed time

3506249

fix: report inbound strem bytes

fef797f

feat: disable history validation

4eee563

feat: when Enrs is returned from census node, report it as success

789dd07

fix: rebase with ping extensions

bc0d4e9

ogenev force-pushed the findcontent-experiment branch from 5d9629f to bc0d4e9 Compare January 29, 2025 11:00

ogenev changed the base branch from download-experiment to master January 29, 2025 11:05

ogenev changed the base branch from master to download-experiment January 29, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental download code #1627

Experimental download code #1627

ogenev commented Jan 9, 2025 •

edited

Loading

ogenev commented Jan 16, 2025

morph-dev commented Jan 16, 2025

morph-dev commented Jan 16, 2025

ogenev commented Jan 17, 2025

ogenev commented Jan 23, 2025 •

edited

Loading

pipermerriam commented Jan 23, 2025

ogenev commented Jan 23, 2025 •

edited

Loading

Experimental download code #1627

Are you sure you want to change the base?

Experimental download code #1627

Conversation

ogenev commented Jan 9, 2025 • edited Loading

ogenev commented Jan 16, 2025

morph-dev commented Jan 16, 2025

morph-dev commented Jan 16, 2025

ogenev commented Jan 17, 2025

ogenev commented Jan 23, 2025 • edited Loading

pipermerriam commented Jan 23, 2025

ogenev commented Jan 23, 2025 • edited Loading

ogenev commented Jan 9, 2025 •

edited

Loading

ogenev commented Jan 23, 2025 •

edited

Loading

ogenev commented Jan 23, 2025 •

edited

Loading