Skip to content

Commit

Permalink
nep
Browse files Browse the repository at this point in the history
  • Loading branch information
jancionear committed Jan 13, 2025
1 parent b2c75fe commit 0ff27d1
Show file tree
Hide file tree
Showing 19 changed files with 1,383 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,383 changes: 1,383 additions & 0 deletions neps/nep-bandwidth-scheduler.md

Large diffs are not rendered by default.

2 comments on commit 0ff27d1

@matejpavlovic
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, a disclaimer first. As a newcomer, my background and context is still quite limited (but I'm working on improving that), so my questions and remarks might be a bit naive. Feel free to ignore any/all that you don't find meaningful.

As far as I understood the problem, this seems to be a nice and working practical solution. Here are some points that I was thinking of while reading the document:

  1. Is it possible that, under a DoS attack, the queues just keep growing indefinitely? It looks like even if they do, it's probably still better (or at least not worse) than the current solution, where an overload would probably result in missing chunks. I was only wondering if, since the mechanism is already being changed, it could be a good time to implement some hard limit on the queue size, after which receipts would be explicitly dropped (and the corresponding transactions would fail).
  2. This is most likely out of scope (and rightfully so), but I was wondering if it could make sense (in the future) to allow to pay extra for being a higher priority in bandwidth assignment. E.g., receipts containing metadata about an extra fee to be burned/redistributed somehow if the receipt is processed before block height x.
  3. Since the base_bandwidth is always assigned (unless the destination is congested), would it make sense to slightly change the semantics of the bandwidth scheduler configuration such that the base bandwidth would not be considered in the calculations at all and would always be granted implicitly? I could imagine that it could simplify the bandwidth scheduler. But I could also imagine that it would not and the currently proposed solution is better. Just wondering if such an option was considered.
  4. Is it possible that a value x at the end of the BandwidthRequestValues array becomes impossible to use if x + base_bandwidth exceeds max_shard_bandwidth? If this is the case, the alternative approach to base_bandwidth mentioned in the previous point might make this problem disappear.
  5. "Requests with the same allowance are processed in a random order." This doesn't seem to be explicitly deterministic. Is it necessary to specify some (even arbitrary) ordering on requests with the same allowance to keep the output of the bandwidth scheduler deterministic? Otherwise it looks underspecified (unless I'm missing something, which is possible). One could, for example, order the links lexicographically and rotate the order by block height modulo number of links (to ensure liveness).
  6. In some of the examples, shards send receipts to themselves. Does this actually consume bandwidth?
  7. How is the bandwidth allowance replenished? I.e., in the token bucket analogy, how fast are tokens added to the buckets?

Overall I find the design very nice and solid, and it seems that you put a lot of thought in dealing with all the details and corner cases.
I'll be happy to discuss any of the above points, here or on a call, if you find it meaningful. Also please let me know if anything is unclear.

@jancionear
Copy link
Contributor Author

@jancionear jancionear commented on 0ff27d1 Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that, under a DoS attack, the queues just keep growing indefinitely?

There is a separate mechanism for backpressure between shards, it's called "congestion control". You can check out the NEP: https://github.com/near/NEPs/blob/master/neps/nep-0539.md

TLDR: Every shard has a "congestion level" which measures how work much there is in the shard's queues. When congestion level gets high, other shards stop sending receipts to this shard. To avoid deadlocks there is one "allowed shard" which is allowed to send receipts to the shard with high congestion level.

This is most likely out of scope (and rightfully so), but I was wondering if it could make sense (in the future) to allow to pay extra for being a higher priority in bandwidth assignment. E.g., receipts containing metadata about an extra fee to be burned/redistributed somehow if the receipt is processed before block height x.

There were some plans to add transaction priorities (NEP: #541), but AFAIK there is nothing concrete in the protocol yet, so for now I didn't take them into account. But in the future there could definitely be some kind of integration, the bandwidth request could contain additional information about priorities and the scheduler could take that into account 👍

Since the base_bandwidth is always assigned (unless the destination is congested), would it make sense to slightly change the semantics of the bandwidth scheduler configuration such that the base bandwidth would not be considered in the calculations at all and would always be granted implicitly?

Hmm there is definitely something to it. We could decrease all bandwidth requests by base_bandwidth and then ignore the fact that base_bandwidth exists. When designing I thought about base_bandwidth in the same way as other bandwidth grants, it's something that has been granted before and now we can grant more. I think both way of thinking about it are valid, not sure if one is better than the other, but it's interesting to think about it from another perspective 👍

Is it possible that a value x at the end of the BandwidthRequestValues array becomes impossible to use if x + base_bandwidth exceeds max_shard_bandwidth? If this is the case, the alternative approach to base_bandwidth mentioned in the previous point might make this problem disappear.

That's a good point, I missed that 👍. I'll see if there's an easy way to fix that.

"Requests with the same allowance are processed in a random order." This doesn't seem to be explicitly deterministic.

It's random, but in a deterministic way. There's a random generator which is seeded in a deterministic way for every scheduler run. That's underspecified, I'll add a paragraph about it 👍

In some of the examples, shards send receipts to themselves. Does this actually consume bandwidth?

Yes, every shard has an outgoing buffer to themselves and sends receipts to itself using this buffer. It consumes bandwidth because the chunk validators are stateless and need to download incoming receipts for every validated chunk. Chunk validators are rotated often (AFAIK at every height), and the receipts need to be sent over the network.

How is the bandwidth allowance replenished? I.e., in the token bucket analogy, how fast are tokens added to the buckets?

There's a short note in the bandwidth scheduler example, but maybe I should add an explicit paragraph somewhere.
The allowance is increased by max_shard_bandwidth / num_shards, which corresponds to sending the same amount of data on all links and saturating the available bandwidth.

fair_link_bandwidth = max_shard_bandwidth / num_shards = 4.5MB/3 = 1.5MB

Btw it would be better to have this discussion under the PR (#584) so that others can see it as well, I'll link to these comments from the PR.

Please sign in to comment.