Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to override default timeouts #175

Closed
weili-jiang opened this issue Oct 17, 2019 · 11 comments
Closed

Ability to override default timeouts #175

weili-jiang opened this issue Oct 17, 2019 · 11 comments

Comments

@weili-jiang
Copy link
Contributor

There are a number of timeouts that are currently defined as constants: https://aiocoap.readthedocs.io/en/latest/module/aiocoap.numbers.constants.html#aiocoap.numbers.constants.REQUEST_TIMEOUT

Some of those are not in the RFC at all. Even if they are, it would be nice to be able to set them on a per request or context level.

The motivation is that in a specific application network known to have relatively low latency (but packet loss), it may be desirable to have faster retries and a custom timeouts.

@alexbarcelo
Copy link

I concur. In my scenario I don't really need to change it on a per request, I don't know if that makes sense, I was thinking on a way of globally changing those values for my scenario (IoT with sleepy end devices and huge latencies).

A very straightforward of achieving this is to create a Constants class with all the stuff, i.e.:

class Constants:
    ACK_TIMEOUT = 10.0
    MAX_RETRANSMIT = 6
    ...

This is a way to allow the application to change those values. However, this implies a heavy refactoring --inconsequential, but ubiquitous.

Is this something desirable for this library?

@roysjosh
Copy link
Contributor

I think we're starting to hit this for HA + homekit_controller & sleepy Thread + HAP accessories. The average ping time for one individual is ~2600ms, over the 2.0s ACK_TIMEOUT. Worse, the retransmits are shot down somewhere, maybe at the Border Router, killing the connection attempt entirely.

@chrysn
Copy link
Owner

chrysn commented Nov 10, 2022 via email

@jfroy
Copy link

jfroy commented Nov 15, 2022

On Thu, Nov 10, 2022 at 10:50:23AM -0800, Joshua Roys wrote: I think we're starting to hit this for HA + homekit_controller & sleepy Thread + HAP accessories. The average ping time for one individual is ~2600ms, over the 2.0s ACK_TIMEOUT. Worse, the retransmits are shot down somewhere, maybe at the Border Router, killing the connection attempt entirely.
I'll have a look at it next time I get my hands on aiocoap (which might be some time given I'm a bit swamped ATM). But one thing in advance: If retransmits are swallowed by a router, chances are you'll run into trouble no matter the timeout. (If the BR were acting as a proper intercepting proxy, it'd send an ACK and manage retransmits -- a behavior I'd even encourage if it were explicit and not intercepting). So even when this becomes configurable, please still look at what swallows the messages, or how the BR behaves.

Thread border routers are physical, link, network, and transport layer devices, with IPv6 as the native Thread network layer (BRs can provide NAT to integrate with IPv4-only LANs) and otherwise being transparent for end-to-end IP communication. So there is no expectation that a BR would understand COAP and provide proxy services for it.

That being said, Thread 1.3 added mandatory support for DNS service discovery and registration proxying (to avoid the costs of multicast mdns on the Thread mesh network and to allow sleepy Thread devices to, well, sleep), so it may be the case that in the future other application protocols or network services will be specifically handled by BRs.

I don't have data proving or disproving that retransmits are shot down, but based on the Thread specification a well-behaved BR should not be filtering packets in such a manner (no deep packet inspection allowing application protocol-specific rate limiting or filtering). Of course, with UDP, there is no delivery guarantee, and that seems like a likely-enough explanation.

@roysjosh
Copy link
Contributor

I'm not sure whether it is the border router or perhaps a Thread router, but something is sending icmp6 "no route to host" errors back to HA. It appears to match up with the retransmit to a sleepy device but I haven't been able to reproduce this with my small network of FTD nodes. I'm leaning towards a Thread router trying to indicate that it can't reach a child node...

2022-11-09 19:13:39.127 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:cfc3:554d:edca:32d4]:5683/2
2022-11-09 19:13:39.139 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:17d8:ee7d:c2ad:6eb6]:5683/2
2022-11-09 19:13:39.147 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:7935:fc6e:69a2:fe5f]:5683/2
2022-11-09 19:13:39.156 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:4b1b:ed70:dc3a:7438]:5683/2
2022-11-09 19:13:39.159 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:d350:15d9:7a39:fc10]:5683/2
2022-11-09 19:13:39.170 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:ba37:3c9b:7899:87a5]:5683/2
2022-11-09 19:13:41.566 INFO (MainThread) [coap-server] Retransmission, Message ID: 52364.
2022-11-09 19:13:41.793 INFO (MainThread) [coap-server] Retransmission, Message ID: 33951.
2022-11-09 19:13:41.800 INFO (MainThread) [coap-server] Retransmission, Message ID: 50357.
2022-11-09 19:13:41.894 INFO (MainThread) [coap-server] Retransmission, Message ID: 10841.
2022-11-09 19:13:42.063 INFO (MainThread) [coap-server] Retransmission, Message ID: 42904.
2022-11-09 19:13:42.119 INFO (MainThread) [coap-server] Retransmission, Message ID: 56879.
2022-11-09 19:13:42.211 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
2022-11-09 19:13:42.274 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
2022-11-09 19:13:42.277 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
2022-11-09 19:13:42.280 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
2022-11-09 19:13:42.282 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable
2022-11-09 19:13:42.284 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable

@jfroy
Copy link

jfroy commented Nov 18, 2022

Thread devices are required to implement Destination Unreachable (type 1) icmp6 messages (specifically RFC 4443 section 3.1), so any node may have sent that back. As you speculate, it is likely the border router, though I did not find code implementing that behavior in OpenThread.

Destination Unreachable (type 1) with code 0 (No route to destination) icmp6 messages are sent by FTDs when their EID (endpoint identifier)-to-RLOC (routing locator) cache contains an invalid entry. Both EIDs and RLOCs are IPv6 addresses, but EIDs are visible to applications and do not change for a given device even if the mesh topology changes. RLOCs are private IPv6 addresses used to actually deliver datagrams and do change when the mesh topology changes. However, I don't think you'd ever see those messages forwarded outside of the Thread mesh.

@chrysn
Copy link
Owner

chrysn commented Nov 21, 2022

Exploring how to fix this: The high-level messages aiocoap usually handles and the nitty-gritty details of transports are quite decoupled.

I'm leaning towards having a bunch of parameters in an object, similar to how @alexbarcelo suggested. These would take the current (module based) constants as defaults.

I'm not sure how to guide the selection of that object. How would you prefer to configure it, or how would you know which parameters to choose? Would it work to have these as hints on the message, so that the client sets these hints like it sets whether it's rather have this CON or NON? Would that work for the response as well? Would it be more practical to have a per-context configurable decision function that looks at the address (say, looks up whether the address is in a network known to be a Thread managed one) and decides which set of defaults to use?

@chrysn
Copy link
Owner

chrysn commented Nov 21, 2022

Taking things up from another thread / @Jc2k:

The exact sleep interval is available over the HAP protocol through characteristic 0000023A-0000-1000-8000-0026BB765291. For all my battery powered thread devices, its 5s. But thats probably not representitive.

Not being familiar with details of Thread I'll assume that the sleeping device is a server, and has been discovered and possibly been probed for that characteristic. Would you, then, consider it practical to pass a parameter object in with each request sent to a peer of which a sleep value is known?

@Jc2k
Copy link

Jc2k commented Nov 21, 2022

From an API usability POV, I'd probably want to defer to @roysjosh here as he did the hard work on this, it looks like we could work with that...

@chrysn
Copy link
Owner

chrysn commented Nov 21, 2022

Please have a look at #294 to see whether that'd help with your use case.

The idea is that you'd subclass TransportTuning (eg. to a form that takes ACK_TIMEOUT as an instance parameter) and then pass that into every message you send to a known-sleep node as the transport_tuning parameter.

@chrysn chrysn closed this as completed in 96eeb67 Nov 21, 2022
@chrysn
Copy link
Owner

chrysn commented Nov 21, 2022

This was accidentally closed when I entered the wrong issue number into 96eeb67 -- that should have closed #288.

@chrysn chrysn reopened this Nov 21, 2022
@chrysn chrysn closed this as completed in 61a73ff Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants