Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hcpy stops working after a while until I reboot the hcpy docker container #120

Open
nicx opened this issue Jan 17, 2025 · 11 comments
Open

Comments

@nicx
Copy link

nicx commented Jan 17, 2025

When starting hcpy (in my case as a docker container) everything is working as expected. I can monitor my dishwasher and washer, I can start it. Great so far.

After some time it just stops working. I get no more value updates, I cannot start my washer or dishwasher anymore. After a restart of the hcpy docker container it immediately works again. In the log file the restart was at 2025-01-17 11:54:33.707763.

log.txt

I already tried to switch from DNS name to IP, but with no luck. The problem keeps the same.

Is there anything I can do to find out the reason for that?

and as a workaround: is there anything (eg a ha sensor) which I could monitor if the problem occurs? With that I could automate a restart of the container.

@Meatballs1
Copy link
Collaborator

I get similar issues occasionally with my dishwasher, the socket appears to stay open but there's no further traffic from the device.

I think we are doing everything correctly to monitor and alert on socket timeouts etc, I need to check if its actually staying open. I also wonder if the socket closing may not be signalled fully into docker but seems unlikely.

Potentially need some watchdog that restarts the connection if no traffic is seen in a certain amount of time. Potentially could attempt to send a status request to each device periodically.

@nicx
Copy link
Author

nicx commented Jan 20, 2025

@Meatballs1 i have continued to observe the behavior. in my case, hcpy seems to lose the connection to the mosquitto mqtt broker at some point. since i do a nightly restart of my docker containers for backup purposes, i have now excluded mosquitto and hcpy from this. i will continue to observe and give feedback.

@nicx
Copy link
Author

nicx commented Jan 30, 2025

@Meatballs1 after a view more days observing the behaviour it seems that the connection is almost 100% stable if neither of the two docker containers is restarted. as soon as one of the two containers is restarted, the devices lose the connection after some time. It does not matter which of the containers is restarted, mqtt or hcpy.

can we find out where exactly the problem is? is it hcpy not doing a mqtt reconnect? is it the devices closing the connection to hcpy? is it maybe several different problems?

@Meatballs1
Copy link
Collaborator

Meatballs1 commented Jan 30, 2025

I assume you mean the HA or the HCPY docker?

You don't see the ERROR MQTT client disconnected: in logs at all?

I think maybe adding a timeout to loop_forever may be what we are missing, as it should auto reconnect by default:

https://eclipse.dev/paho/files/paho.mqtt.python/html/client.html

edit: although it looks like it may default to 1s anyway?

@nicx
Copy link
Author

nicx commented Jan 30, 2025

@Meatballs1 no, I mean the hcpy and mosquito container. My HA container I'll reboot still every night with our problems. And no, I do not see any MQTT client disconnected errors.

@Meatballs1
Copy link
Collaborator

Ah yes, I'm wondering if we need to use loop_start() to thread the MQTT client, or we have some other multithreading deadlock happening. Will need to investigate a bit.

@nicx
Copy link
Author

nicx commented Jan 30, 2025

@Meatballs1 great, thanks. please let me know if I can help in any way.

@Meatballs1
Copy link
Collaborator

I have observed an exception when reconnecting to mqtt (#121) but this doesn't show up in your logs so I dont think its related.

@Meatballs1
Copy link
Collaborator

I think I reproduced the issue once or twice by restarting MQTT but I couldn't do it reliably.

I've pushed the reconnect exception fix, and also a change to the logging as it looks like some messages didn't get flushed to the log file. I was wondering if you had the exception but it never appeared as it wasn't being flushed.

If you could try the latest version and see if it continues?

I have a couple of 'attempted' fixes to try if that doesn't resolve the issue:

https://github.com/Meatballs1/hcpy-2.0/tree/thread_safe_publish

https://github.com/Meatballs1/hcpy-2.0/tree/reconnection_fix

@Meatballs1
Copy link
Collaborator

Also interested to see if disabling ha_discovery makes any difference as that fires a lot of publish messages on connect.

@nicx
Copy link
Author

nicx commented Jan 31, 2025

@Meatballs1 I will try it with every new version and will give feedback ;) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants