-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hcpy stops working after a while until I reboot the hcpy docker container #120
Comments
I get similar issues occasionally with my dishwasher, the socket appears to stay open but there's no further traffic from the device. I think we are doing everything correctly to monitor and alert on socket timeouts etc, I need to check if its actually staying open. I also wonder if the socket closing may not be signalled fully into docker but seems unlikely. Potentially need some watchdog that restarts the connection if no traffic is seen in a certain amount of time. Potentially could attempt to send a status request to each device periodically. |
@Meatballs1 i have continued to observe the behavior. in my case, hcpy seems to lose the connection to the mosquitto mqtt broker at some point. since i do a nightly restart of my docker containers for backup purposes, i have now excluded mosquitto and hcpy from this. i will continue to observe and give feedback. |
@Meatballs1 after a view more days observing the behaviour it seems that the connection is almost 100% stable if neither of the two docker containers is restarted. as soon as one of the two containers is restarted, the devices lose the connection after some time. It does not matter which of the containers is restarted, mqtt or hcpy. can we find out where exactly the problem is? is it hcpy not doing a mqtt reconnect? is it the devices closing the connection to hcpy? is it maybe several different problems? |
I assume you mean the HA or the HCPY docker? You don't see the I think maybe adding a timeout to loop_forever may be what we are missing, as it should auto reconnect by default: https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html edit: although it looks like it may default to 1s anyway? |
@Meatballs1 no, I mean the hcpy and mosquito container. My HA container I'll reboot still every night with our problems. And no, I do not see any MQTT client disconnected errors. |
Ah yes, I'm wondering if we need to use |
@Meatballs1 great, thanks. please let me know if I can help in any way. |
I have observed an exception when reconnecting to mqtt (#121) but this doesn't show up in your logs so I dont think its related. |
I think I reproduced the issue once or twice by restarting MQTT but I couldn't do it reliably. I've pushed the reconnect exception fix, and also a change to the logging as it looks like some messages didn't get flushed to the log file. I was wondering if you had the exception but it never appeared as it wasn't being flushed. If you could try the latest version and see if it continues? I have a couple of 'attempted' fixes to try if that doesn't resolve the issue: https://github.com/Meatballs1/hcpy-2.0/tree/thread_safe_publish https://github.com/Meatballs1/hcpy-2.0/tree/reconnection_fix |
Also interested to see if disabling ha_discovery makes any difference as that fires a lot of publish messages on connect. |
@Meatballs1 I will try it with every new version and will give feedback ;) Thanks! |
When starting hcpy (in my case as a docker container) everything is working as expected. I can monitor my dishwasher and washer, I can start it. Great so far.
After some time it just stops working. I get no more value updates, I cannot start my washer or dishwasher anymore. After a restart of the hcpy docker container it immediately works again. In the log file the restart was at 2025-01-17 11:54:33.707763.
log.txt
I already tried to switch from DNS name to IP, but with no luck. The problem keeps the same.
Is there anything I can do to find out the reason for that?
and as a workaround: is there anything (eg a ha sensor) which I could monitor if the problem occurs? With that I could automate a restart of the container.
The text was updated successfully, but these errors were encountered: