hcpy stops working after a while until I reboot the hcpy docker container #120

nicx · 2025-01-17T11:07:34Z

When starting hcpy (in my case as a docker container) everything is working as expected. I can monitor my dishwasher and washer, I can start it. Great so far.

After some time it just stops working. I get no more value updates, I cannot start my washer or dishwasher anymore. After a restart of the hcpy docker container it immediately works again. In the log file the restart was at 2025-01-17 11:54:33.707763.

log.txt

I already tried to switch from DNS name to IP, but with no luck. The problem keeps the same.

Is there anything I can do to find out the reason for that?

and as a workaround: is there anything (eg a ha sensor) which I could monitor if the problem occurs? With that I could automate a restart of the container.

Meatballs1 · 2025-01-19T22:53:16Z

I get similar issues occasionally with my dishwasher, the socket appears to stay open but there's no further traffic from the device.

I think we are doing everything correctly to monitor and alert on socket timeouts etc, I need to check if its actually staying open. I also wonder if the socket closing may not be signalled fully into docker but seems unlikely.

Potentially need some watchdog that restarts the connection if no traffic is seen in a certain amount of time. Potentially could attempt to send a status request to each device periodically.

nicx · 2025-01-20T08:19:40Z

@Meatballs1 i have continued to observe the behavior. in my case, hcpy seems to lose the connection to the mosquitto mqtt broker at some point. since i do a nightly restart of my docker containers for backup purposes, i have now excluded mosquitto and hcpy from this. i will continue to observe and give feedback.

nicx · 2025-01-30T07:08:24Z

@Meatballs1 after a view more days observing the behaviour it seems that the connection is almost 100% stable if neither of the two docker containers is restarted. as soon as one of the two containers is restarted, the devices lose the connection after some time. It does not matter which of the containers is restarted, mqtt or hcpy.

can we find out where exactly the problem is? is it hcpy not doing a mqtt reconnect? is it the devices closing the connection to hcpy? is it maybe several different problems?

Meatballs1 · 2025-01-30T09:46:09Z

I assume you mean the HA or the HCPY docker?

You don't see the ERROR MQTT client disconnected: in logs at all?

I think maybe adding a timeout to loop_forever may be what we are missing, as it should auto reconnect by default:

https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html

edit: although it looks like it may default to 1s anyway?

nicx · 2025-01-30T09:59:56Z

@Meatballs1 no, I mean the hcpy and mosquito container. My HA container I'll reboot still every night with our problems. And no, I do not see any MQTT client disconnected errors.

Meatballs1 · 2025-01-30T10:04:04Z

Ah yes, I'm wondering if we need to use loop_start() to thread the MQTT client, or we have some other multithreading deadlock happening. Will need to investigate a bit.

nicx · 2025-01-30T10:17:38Z

@Meatballs1 great, thanks. please let me know if I can help in any way.

Meatballs1 · 2025-01-30T11:01:26Z

I have observed an exception when reconnecting to mqtt (#121) but this doesn't show up in your logs so I dont think its related.

Meatballs1 · 2025-01-30T16:09:33Z

I think I reproduced the issue once or twice by restarting MQTT but I couldn't do it reliably.

I've pushed the reconnect exception fix, and also a change to the logging as it looks like some messages didn't get flushed to the log file. I was wondering if you had the exception but it never appeared as it wasn't being flushed.

If you could try the latest version and see if it continues?

I have a couple of 'attempted' fixes to try if that doesn't resolve the issue:

https://github.com/Meatballs1/hcpy-2.0/tree/thread_safe_publish

https://github.com/Meatballs1/hcpy-2.0/tree/reconnection_fix

Meatballs1 · 2025-01-30T19:43:28Z

Also interested to see if disabling ha_discovery makes any difference as that fires a lot of publish messages on connect.

nicx · 2025-01-31T08:17:27Z

@Meatballs1 I will try it with every new version and will give feedback ;) Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hcpy stops working after a while until I reboot the hcpy docker container #120

hcpy stops working after a while until I reboot the hcpy docker container #120

nicx commented Jan 17, 2025 •

edited

Loading

Meatballs1 commented Jan 19, 2025

nicx commented Jan 20, 2025

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025 •

edited

Loading

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

nicx commented Jan 31, 2025

hcpy stops working after a while until I reboot the hcpy docker container #120

hcpy stops working after a while until I reboot the hcpy docker container #120

Comments

nicx commented Jan 17, 2025 • edited Loading

Meatballs1 commented Jan 19, 2025

nicx commented Jan 20, 2025

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025 • edited Loading

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

nicx commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

Meatballs1 commented Jan 30, 2025

nicx commented Jan 31, 2025

nicx commented Jan 17, 2025 •

edited

Loading

Meatballs1 commented Jan 30, 2025 •

edited

Loading