Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sleepy Thread devices #249

Open
roysjosh opened this issue Nov 10, 2022 · 8 comments
Open

Sleepy Thread devices #249

roysjosh opened this issue Nov 10, 2022 · 8 comments

Comments

@roysjosh
Copy link
Contributor

chrysn/aiocoap#175 (comment)

I think a forum user is hitting this aiocoap issue. Average ping RTT to the sleepy accessory is ~2600ms, over the 2.0 aiocoap ACK_TIMEOUT.

2022-11-09 19:08:17.637 DEBUG (MainThread) [aiohomekit.controller.coap.connection] Pair verify uri=coap://[fdd8:9c7d:c2d1:0:d350:15d9:7a39:fc10]:5683/2
2022-11-09 19:08:17.638 DEBUG (MainThread) [aiohomekit.protocol.tlv] sending [
6 (State): (1 bytes/<class 'bytearray'>) 0x01
3 (PublicKey): (32 bytes/<class 'bytes'>) b'\xed\xc4\n\xeb\xf2E\xbe\r\x19P\xf3\x912\xf4\x1c\x0cq\xc8U\x9c\xdfP\xb1\x80\\\xc6\xcb\xc6\xcd~\x01+'
]

2022-11-09 19:08:20.009 INFO (MainThread) [coap-server] Retransmission, Message ID: 37481.

2022-11-09 19:08:20.745 ERROR (MainThread) [coap-server] Error received and ignored in this codepath: [Errno 113] Host is unreachable

Note that the retransmit triggers something (the BR?) to entirely kill the connection, probably since the first is still outstanding.

I'll put together a PR for aiocoap with a knob we'll have to adjust the CoAP controller here.

@roysjosh
Copy link
Contributor Author

Well, changing ACK_TIMEOUT from 2.0 to 12.0 didn't seem to help this particular user. I need to order a few sleepy Thread devices so I can get this sorted out.

@Jc2k
Copy link
Owner

Jc2k commented Nov 12, 2022

Wemo Stage and Eve Thermo are the sleepy HAP devices I'm testing with FWIW.

@Jc2k
Copy link
Owner

Jc2k commented Nov 15, 2022

As I said in ^, bumping ACK_TIMEOUT to 10 has got rid of thousands of "Received Type.ACK from but could not match it to a running exchange" warnings in my logs.

Looking at https://github.com/chrysn/aiocoap/blob/419c88cb7bec693f3c33b188a8e394e6666983c5/aiocoap/messagemanager.py#L215, they were caused by original transmission and the retransmission sharing an entry in self._active_exchanges. If both are ACK'd, original will clear the entire in active exchanges first, then the retransmission won't be able to find itself in active exchanges any more, causing that log spam.

So I think even if its not the root cause for your user, its something we need to resolve as that error message is very common and comes up a lot in HA tickets.

@jfroy
Copy link

jfroy commented Nov 15, 2022

I need to read the Thread spec to see how much buffering is allowed for sleepy devices. Bumping the timeout for connections to sleepy Thread devices would probably be an OK fix. Handling re-transmit ACKs per the COAP protocol spec would also be a good fix.

@Jc2k
Copy link
Owner

Jc2k commented Nov 15, 2022

I think we want both - it seems daft (and wasteful) to retransmit when we know sleepy thread devices will "timeout" half the time.

But equally if it's valid to ACK a retransmission and the original then aiocoap should be silently handling that.

@Jc2k
Copy link
Owner

Jc2k commented Nov 15, 2022

(But yes, I plucked 10s out of the air, we should use something closer to the spec - I thought it was 5s but haven't checked)

@Jc2k
Copy link
Owner

Jc2k commented Nov 22, 2022

This has been said in other tickets, but duping here.

Latest aiocoap should no longer RST in violation of spec and makes re-transmit dupe ACK's less noisy in the logs. They do however still generate noise, so we need to handle the timeouts and make them longer than the sleep interval of the device.

Sleepy devices I have do sleep for 5s, however it's set by a characteristic called "Sleep Interval". So we need to tune the retry interval per endpoint, respecting that char.

@Jc2k
Copy link
Owner

Jc2k commented Dec 1, 2022

chrysn/aiocoap#294 will let us avoid re-transmits during a sleepy devices sleep interval. We'll probably want a TransportTuning per connection, and we should use the "Sleep Interval" characteristic value if its available to configure it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants