Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MQTT not reconnecting if disconnected from broker #31057

Closed
billimek opened this issue Jan 21, 2020 · 14 comments · Fixed by #34556
Closed

MQTT not reconnecting if disconnected from broker #31057

billimek opened this issue Jan 21, 2020 · 14 comments · Fixed by #34556

Comments

@billimek
Copy link

Home Assistant release with the issue: 0.104.3

Last working Home Assistant release (if known): not known

Operating environment (Hass.io/Docker/Windows/etc.): docker

Integration: MQTT Broker

Description of problem:

When the external MQTT broker service is restarted, Home Assistant doesn't seem to reconnect, or seems to give up too quickly. The only remedy to re-establish connection with the MQTT broker is to restart Home Assistant itself which seems unnecessary.

I see the following in the home assistant logs, which suggests that it reconnects and then one second later sees a connection refused and then appears to stop trying anything further related to MQTT:

2020-01-21 17:17:47 INFO (Thread-12) [homeassistant.components.mqtt] Successfully reconnected to the MQTT server
2020-01-21 17:17:48 ERROR (Thread-12) [homeassistant.components.mqtt] Unable to connect to the MQTT broker: Connection Refused: broker unavailable.

For comparison, I observed the behavior of three other different MQTT client applications at the same time the MQTT broker was being restarted:

  • node-red: disconnects and then reconnects,
21 Jan 12:17:47 - [info] [mqtt-broker:mqtt] Disconnected from broker: mqtt://10.2.0.102:1883
21 Jan 12:18:03 - [info] [mqtt-broker:mqtt] Connected to broker: mqtt://10.2.0.102:1883
  • teslamate: disconnects and then reconnects,
12:17:47.611 [warn] MQTT connection has been dropped
12:17:47.620 [warn] MQTT Client has been terminated with reason: {:function_clause, [{TeslaMate.Mqtt.Handler, :connection, [:terminated, []], [file: 'lib/teslamate/mqtt/handler.ex', line: 7]}, {Tortoise.Handler, :execute, 2, [file: 'lib/tortoise/handler.ex', line: 314]}, {Tortoise.Connection.Controller, :handle_info, 2, [file: 'lib/tortoise/connection/controller.ex', line: 215]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}
12:18:33.507 [info] MQTT connection has been established
  • zwave2mqtt: disconnects and then reconnects,
2020-01-21T17:17:47.611Z z2m:Mqtt MQTT client offline
2020-01-21T17:17:47.612Z z2m:Mqtt MQTT client closed
2020-01-21T17:18:47.626Z z2m:Mqtt MQTT client reconnecting
2020-01-21T17:18:47.646Z z2m:Mqtt MQTT client connected

Perhaps Home Assistant needs some sort of retry/backoff logic in situations when it loses connection to an MQTT broker instead of giving up the first time it fails to reconnect? If I can puzzle this out, I'm happy to raise a PR.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

mqtt:
  discovery: true
  discovery_prefix: homeassistant
  broker: vernemq
  username: !secret mqtt_user
  password: !secret mqtt_password
  birth_message:
    topic: 'hass/status'
    payload: 'online'
  will_message:
    topic: 'hass/status'
    payload: 'offline'

Additional information:

I already looked though #8589 from 2-3 years ago as well as #10133 but Home-Assistant doesn't appear to be reconnecting in situations when the MQTT broker restarts.

@probot-home-assistant
Copy link

Hey there @home-assistant/core, mind taking a look at this issue as its been labeled with a integration (mqtt) you are listed as a codeowner for? Thanks!

@billimek
Copy link
Author

Suspect the behavior is related to the phao.mqtt lib after looking at this issue and this issue.

@refigh
Copy link

refigh commented Apr 2, 2020

Suspect the behavior is related to the phao.mqtt lib after looking at this issue and this issue.

could you find any solution? I still suffer from this.

@billimek
Copy link
Author

No solution yet as far as I can tell. It's really annoying because any time my mqtt broker restarts, home-assistant 'gives up' immediately instead of properly attempting reconnects.

The only recourse is to manually restart home-assistant to force a fresh connection to the mqtt broker. It's a non-ideal user experience.

@billimek
Copy link
Author

@balloob do you happen to have any thoughts on this or hints about digging into a root-cause?

I can pretty easily reproduce this issue simply by restarting my mqtt broker.

@billimek
Copy link
Author

Looking through the code, it is apparent that _mqtt_on_disconnect is being called:

    def _mqtt_on_disconnect(self, _mqttc, _userdata, result_code: int) -> None:
        """Disconnected callback."""
        self.connected = False

        # When disconnected because of calling disconnect()
        if result_code == 0:
            return

        tries = 0

        while True:
            try:
                if self._mqttc.reconnect() == 0:
                    self.connected = True
                    _LOGGER.info("Successfully reconnected to the MQTT server")
                    break
            except OSError:
                pass

            wait_time = min(2 ** tries, MAX_RECONNECT_WAIT)
            _LOGGER.warning(
                "Disconnected from MQTT (%s). Trying to reconnect in %s s",
                result_code,
                wait_time,
            )
            # It is ok to sleep here as we are in the MQTT thread.
            time.sleep(wait_time)
            tries += 1

.. The observed behavior from home-assistant logs is that Home Assistant first logs,

Successfully reconnected to the MQTT server

which seems to correspond with self._mqttc.reconnect() evaluating true.

Immediately after that home-assistant logs,

Unable to connect to the MQTT broker: Connection Refused: broker unavailable.

Which corresponds to a call in _mqtt_on_connect:

        import paho.mqtt.client as mqtt

        if result_code != mqtt.CONNACK_ACCEPTED:
            _LOGGER.error(
                "Unable to connect to the MQTT broker: %s",
                mqtt.connack_string(result_code),
            )
            self._mqttc.disconnect()
            return

What appears to be happening is that the paho.mqtt.client reconnect() call is erroneously returning true when a disconnect happens, followed by home-assistant trying to handle a successful reconnect, which it is not in reality, hence the connection refused error from the broker.

It seems to reason that the reconnect() call return should not be trusted, or further reconnect logic is needed even after a successful reconnect() return from phao.

What do you all think?

@billimek
Copy link
Author

billimek commented Apr 21, 2020

I did a lot of different local testing with changes implemented in mqtt/__init__.py to try different approaches to identify a 'fix' for this. None were successful:

  • Attempting to leverage the is_connected method from paho both before and after the self._mqttc.reconnect() call didn't seem to yield any different behavior. - is_connected() seems to think that it has a connection, but _mqtt_on_connect is expressing a connection refused, so clearly it isn't connected.
  • Attempting to add a slight delay before calling reconnect() didn't help
  • Attempting to not even try the reconnect() logic - assuming paho is handling reconnecting internally didn't help
  • Attempting to force home-assistant to connect to the mqtt via websockets (overriding the Client initializer transport from the default "tcp" to "websockets"), but this didn't help

@emontnemery
Copy link
Contributor

emontnemery commented Apr 22, 2020

@billimek Which external MQTT server is used, what version of it, and how do you restart it?
Is the failure to reconnect 100% reproducible, or does it only happens sometimes?

@billimek
Copy link
Author

billimek commented Apr 22, 2020

@erikkastelec the external MQTT server/broker is vernemq 1.10.2 (latest) running in cluster (high availability) mode with 3 replicas. They are running as three different kubernetes pods and the restart is to restart one at a time until each are healthy before restarting the others. Home Assistant, and all my other MQTT clients, connect to it via a single LoadBalancerIP. I can say with confidence that I can reproduce this 100% of the time.

Interestingly, the unexpected behavior with restarts of the vernemq MQTT server are not present with all of the other MQTT client library implementations that I'm also running. In other words, these other MQTT clients properly detect a down connection and attempt to reconnect in an expected way when a multi-node vernemq MQTT cluster is restarted:

Since my last comment, I did further experiments whereby I ran vernemq as just a single-replica MQTT server and the restart behavior from the paho client (Home Assistant) was different: the disconnect resulting from a restart went down the correct path of reconnecting attempts until the vernemq server was able to handle requests once again.

@emontnemery
Copy link
Contributor

Thanks!

The reconnect logic in Home Assistant does not seem correct, reconnect() returning true does not mean a connection is established to the MQTT server, it just means a connect request has been sent to the MQTT server.

I'll have a stab at improving it, it would be great if you could help test it.

@emontnemery
Copy link
Contributor

Would you mind giving this a try: emontnemery@c265c1e ?

@billimek
Copy link
Author

Hey, that did it!

When the vernemq cluster pods began to restart, the homeassistant client connection dropped as expected. The new code yielded the following log statements in home-assistant,

2020-04-22 20:31:55 WARNING (Thread-12) [homeassistant.components.mqtt] Disconnected from MQTT (%s).
2020-04-22 20:31:56 ERROR (Thread-12) [homeassistant.components.mqtt] Unable to connect to the MQTT broker: Connection Refused: broker unavailable.
2020-04-22 20:31:56 WARNING (Thread-12) [homeassistant.components.mqtt] Disconnected from MQTT (%s).
2020-04-22 20:31:58 ERROR (Thread-12) [homeassistant.components.mqtt] Unable to connect to the MQTT broker: Connection Refused: broker unavailable.
2020-04-22 20:31:58 WARNING (Thread-12) [homeassistant.components.mqtt] Disconnected from MQTT (%s).
2020-04-22 20:32:02 ERROR (Thread-12) [homeassistant.components.mqtt] Unable to connect to the MQTT broker: Connection Refused: broker unavailable.
2020-04-22 20:32:02 WARNING (Thread-12) [homeassistant.components.mqtt] Disconnected from MQTT (%s).
2020-04-22 20:32:10 ERROR (Thread-12) [homeassistant.components.mqtt] Unable to connect to the MQTT broker: Connection Refused: broker unavailable.
2020-04-22 20:32:10 WARNING (Thread-12) [homeassistant.components.mqtt] Disconnected from MQTT (%s).

And home assistant automatically reconnected and is behaving as expected. I assume paho has reconnect logic built-in.

This looks great! I did add a comment to the commit about the log message.

@emontnemery
Copy link
Contributor

Great, thanks for testing!

Right, paho has reconnect logic built in, and the reconnect logic in HA was broken.

@rob1998
Copy link

rob1998 commented Mar 31, 2022

Seems like reconnection is no longer working. Anyone else having this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants