-
-
Notifications
You must be signed in to change notification settings - Fork 556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[relay client] choose nearest relay based on latency #2952
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: M Essam Hamed <[email protected]>
Hi @mohamed-essam, When we designed this feature we focused on the fastest connection time as possible. In a corner case, your logic will flip this requirement and block the connection more then the defined timeout time. Imagine that situation if the length of the list of servers is 8. And you have two really slow servers. Actually one of them is between 1-7 and 8. In this case, the code will block until 2*30 sec. For your scenario, we suggest using only one domain with the Geo DNS service. Hard to write good code for the scheduler but we can take try. I think your solution could works well because the But back to your PR. What do you think if after this line, you check the length of the channel and if it contains more then one items, then drain the channel and compare the elapsed times on that? And do not wait for any new connections. |
Hello @pappz, Thanks for the clarification of the requirement, I haven't really considered this case in my change, I think it might be possible to do something that is a mix of both, as in most cases delaying initial connection by some amount (think for example 200ms), could result in a much faster relay off the get go. As for the Geo DNS, my problem with that solution is that it would rely only on geographical distance latency, and not take into account (for example) relay server load. I'm working on a commit now that should address both the corner case of slow servers (by having some sort of upper limit of waiting after the first successful connection, which I'm thinking a value between 200-500ms should be sufficient to both allow cases of Go scheduling delays and weird networking to resolve), and the measurement of latency (by using httptrace inside Please let me know what you think of that previous approach even if I haven't committed it yet 🙏 |
Measuring the time in deeper layers looks promising. Good idea. I will take a look and we will see. I am still thinking about the length of the result channel in a 1 core system as you mentioned in the ticket. Can I ask you for a test? Could you print out the length of the result channel every time when read a result from the channel?
If queued with multiple results, then we do not need to wait more. In the end, maybe we do not choose the fastest server but definitely do not choose the slowest. If this approach shows unexpected results then the time window sounds a good way. |
I don't have access to a single core machine today, so as a workaround I executed netbird -F with cpuaffinity to force it into a single CPU.
In this case (3 relays), there were 2 times where a connection was ready immediately after the last connection was processed. This also seemed to be dependent on whether the CPU was loaded or not, when loaded it was more likely to be always empty (before and after select statement). One more thing I noticed, is that when CPU was free, the order of the connections was almost always (somewhat) already sorted by latency, but when CPU was loaded, the order of connections was more likely to be the order in which they were configured in management. I'm assuming this is because the first item in the array is likely to be connected to first if the CPU load is high enough for the process to wait a bit until connecting to the next relay. All in all, the slight time delay seems to be always choosing the lowest latency in both cases. |
Thank you for sharing your experience. |
That's some nice work on the QUIC protocol, but given it's using the quic-go package I see no hint anywhere in it that it supports net/http/httptracer However, it seems like it already implements its own version of it in logging.ConnectionTracer. After some digging (due to the lack of documentation on that package), I found the equivalent methods of it we would need are
If the QUIC relay is merged before this PR, I could test the previous myself, if you can test the ConnectionTracer on the side and provide the difference between httptracer and ConnectionTracer values I believe it would provide a helpful insight. |
Quality Gate passedIssues Measures |
@mohamed-essam with the addition to QUIC proto support, we caused a merge conflict. can you check them? |
Will try to do so over the weekend🙏 |
Describe your changes
Change picker to choose nearest relay based on connection latency.
Issue ticket number and link
#2950
Checklist