-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to override default timeouts #175
Comments
I concur. In my scenario I don't really need to change it on a per request, I don't know if that makes sense, I was thinking on a way of globally changing those values for my scenario (IoT with sleepy end devices and huge latencies). A very straightforward of achieving this is to create a Constants class with all the stuff, i.e.: class Constants:
ACK_TIMEOUT = 10.0
MAX_RETRANSMIT = 6
... This is a way to allow the application to change those values. However, this implies a heavy refactoring --inconsequential, but ubiquitous. Is this something desirable for this library? |
I think we're starting to hit this for HA + |
On Thu, Nov 10, 2022 at 10:50:23AM -0800, Joshua Roys wrote:
I think we're starting to hit this for HA + `homekit_controller` & sleepy Thread + HAP accessories. The average ping time for one individual is ~2600ms, over the 2.0s `ACK_TIMEOUT`. Worse, the retransmits are shot down somewhere, maybe at the Border Router, killing the connection attempt entirely.
I'll have a look at it next time I get my hands on aiocoap (which might
be some time given I'm a bit swamped ATM). But one thing in advance:
If retransmits are swallowed by a router, chances are you'll run into
trouble no matter the timeout. (If the BR were acting as a proper
intercepting proxy, it'd send an ACK and manage retransmits -- a
behavior I'd even encourage if it were explicit and not intercepting).
So even when this becomes configurable, please still look at what
swallows the messages, or how the BR behaves.
|
Thread border routers are physical, link, network, and transport layer devices, with IPv6 as the native Thread network layer (BRs can provide NAT to integrate with IPv4-only LANs) and otherwise being transparent for end-to-end IP communication. So there is no expectation that a BR would understand COAP and provide proxy services for it. That being said, Thread 1.3 added mandatory support for DNS service discovery and registration proxying (to avoid the costs of multicast mdns on the Thread mesh network and to allow sleepy Thread devices to, well, sleep), so it may be the case that in the future other application protocols or network services will be specifically handled by BRs. I don't have data proving or disproving that retransmits are shot down, but based on the Thread specification a well-behaved BR should not be filtering packets in such a manner (no deep packet inspection allowing application protocol-specific rate limiting or filtering). Of course, with UDP, there is no delivery guarantee, and that seems like a likely-enough explanation. |
I'm not sure whether it is the border router or perhaps a Thread router, but something is sending icmp6 "no route to host" errors back to HA. It appears to match up with the retransmit to a sleepy device but I haven't been able to reproduce this with my small network of FTD nodes. I'm leaning towards a Thread router trying to indicate that it can't reach a child node...
|
Thread devices are required to implement Destination Unreachable (type 1) icmp6 messages (specifically RFC 4443 section 3.1), so any node may have sent that back. As you speculate, it is likely the border router, though I did not find code implementing that behavior in OpenThread. Destination Unreachable (type 1) with code 0 (No route to destination) icmp6 messages are sent by FTDs when their EID (endpoint identifier)-to-RLOC (routing locator) cache contains an invalid entry. Both EIDs and RLOCs are IPv6 addresses, but EIDs are visible to applications and do not change for a given device even if the mesh topology changes. RLOCs are private IPv6 addresses used to actually deliver datagrams and do change when the mesh topology changes. However, I don't think you'd ever see those messages forwarded outside of the Thread mesh. |
Exploring how to fix this: The high-level messages aiocoap usually handles and the nitty-gritty details of transports are quite decoupled. I'm leaning towards having a bunch of parameters in an object, similar to how @alexbarcelo suggested. These would take the current (module based) constants as defaults. I'm not sure how to guide the selection of that object. How would you prefer to configure it, or how would you know which parameters to choose? Would it work to have these as hints on the message, so that the client sets these hints like it sets whether it's rather have this CON or NON? Would that work for the response as well? Would it be more practical to have a per-context configurable decision function that looks at the address (say, looks up whether the address is in a network known to be a Thread managed one) and decides which set of defaults to use? |
Taking things up from another thread / @Jc2k:
Not being familiar with details of Thread I'll assume that the sleeping device is a server, and has been discovered and possibly been probed for that characteristic. Would you, then, consider it practical to pass a parameter object in with each request sent to a peer of which a sleep value is known? |
From an API usability POV, I'd probably want to defer to @roysjosh here as he did the hard work on this, it looks like we could work with that... |
Please have a look at #294 to see whether that'd help with your use case. The idea is that you'd subclass TransportTuning (eg. to a form that takes ACK_TIMEOUT as an instance parameter) and then pass that into every message you send to a known-sleep node as the transport_tuning parameter. |
There are a number of timeouts that are currently defined as constants: https://aiocoap.readthedocs.io/en/latest/module/aiocoap.numbers.constants.html#aiocoap.numbers.constants.REQUEST_TIMEOUT
Some of those are not in the RFC at all. Even if they are, it would be nice to be able to set them on a per request or context level.
The motivation is that in a specific application network known to have relatively low latency (but packet loss), it may be desirable to have faster retries and a custom timeouts.
The text was updated successfully, but these errors were encountered: