-
Notifications
You must be signed in to change notification settings - Fork 143
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
b2c75fe
commit 0ff27d1
Showing
19 changed files
with
1,383 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+88.8 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+88.6 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+89.6 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+92.4 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+94.6 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+94.4 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+95.7 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+96.7 KB
neps/assets/nep-bandwidth-scheduler/distribute_remaining_example_8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Large diffs are not rendered by default.
Oops, something went wrong.
0ff27d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, a disclaimer first. As a newcomer, my background and context is still quite limited (but I'm working on improving that), so my questions and remarks might be a bit naive. Feel free to ignore any/all that you don't find meaningful.
As far as I understood the problem, this seems to be a nice and working practical solution. Here are some points that I was thinking of while reading the document:
base_bandwidth
is always assigned (unless the destination is congested), would it make sense to slightly change the semantics of the bandwidth scheduler configuration such that the base bandwidth would not be considered in the calculations at all and would always be granted implicitly? I could imagine that it could simplify the bandwidth scheduler. But I could also imagine that it would not and the currently proposed solution is better. Just wondering if such an option was considered.x
at the end of theBandwidthRequestValues
array becomes impossible to use ifx + base_bandwidth
exceedsmax_shard_bandwidth
? If this is the case, the alternative approach tobase_bandwidth
mentioned in the previous point might make this problem disappear.Overall I find the design very nice and solid, and it seems that you put a lot of thought in dealing with all the details and corner cases.
I'll be happy to discuss any of the above points, here or on a call, if you find it meaningful. Also please let me know if anything is unclear.
0ff27d1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a separate mechanism for backpressure between shards, it's called "congestion control". You can check out the NEP: https://github.com/near/NEPs/blob/master/neps/nep-0539.md
TLDR: Every shard has a "congestion level" which measures how work much there is in the shard's queues. When congestion level gets high, other shards stop sending receipts to this shard. To avoid deadlocks there is one "allowed shard" which is allowed to send receipts to the shard with high congestion level.
There were some plans to add transaction priorities (NEP: #541), but AFAIK there is nothing concrete in the protocol yet, so for now I didn't take them into account. But in the future there could definitely be some kind of integration, the bandwidth request could contain additional information about priorities and the scheduler could take that into account 👍
Hmm there is definitely something to it. We could decrease all bandwidth requests by
base_bandwidth
and then ignore the fact thatbase_bandwidth
exists. When designing I thought aboutbase_bandwidth
in the same way as other bandwidth grants, it's something that has been granted before and now we can grant more. I think both way of thinking about it are valid, not sure if one is better than the other, but it's interesting to think about it from another perspective 👍That's a good point, I missed that 👍. I'll see if there's an easy way to fix that.
It's random, but in a deterministic way. There's a random generator which is seeded in a deterministic way for every scheduler run. That's underspecified, I'll add a paragraph about it 👍
Yes, every shard has an outgoing buffer to themselves and sends receipts to itself using this buffer. It consumes bandwidth because the chunk validators are stateless and need to download incoming receipts for every validated chunk. Chunk validators are rotated often (AFAIK at every height), and the receipts need to be sent over the network.
There's a short note in the bandwidth scheduler example, but maybe I should add an explicit paragraph somewhere.
The allowance is increased by
max_shard_bandwidth / num_shards
, which corresponds to sending the same amount of data on all links and saturating the available bandwidth.Btw it would be better to have this discussion under the PR (#584) so that others can see it as well, I'll link to these comments from the PR.