Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bridge connection loops messages back forever #63

Open
mspoehr opened this issue Apr 2, 2020 · 16 comments
Open

Bridge connection loops messages back forever #63

mspoehr opened this issue Apr 2, 2020 · 16 comments
Assignees
Labels

Comments

@mspoehr
Copy link

mspoehr commented Apr 2, 2020

I am using the emqx-bridge-mqtt plugin to bridge EMQX to an AWS IoT endpoint. Occasionally (seemly randomly, on emqx start) the connection will start spamming the same messages over and over via the bridge connection until the service is restarted again. It appears that this issue occurs roughly 25% of the time when emqx starts up.

I am using emqx version 4.0.5, with this plugin configured to be loaded on startup (via /var/lib/emqx/loaded_plugins) on Ubuntu Linux 18.04.

Below is an excerpt from the log when this issue occurs.

2020-04-01 13:43:22.253 [warning] <<"someclientid">>@127.0.0.1:40844 [Session] Dropped msg due to mqueue is full: Message(Id=^@^E¢>7Û^ÝôB^@^@^F#Uñ, QoS=1, Topic=aws/some/topic/structure, From=bridge, Flags=[], Headers=)
...
2020-04-01 13:43:22.253 [error] [Bridge] Can't be found from the inflight:45091

Those messages can be seen repeatedly with different identifiers and topics.

The following is the emqx_bridge_mqtt.conf being used:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

You may notice I am bridging both to and from cloud/# on the bridge connection. I would expect a single loopback of all bridged messages if any clients subscribe locally - and this does occur the 75% of the time where emqx is not spamming messages. Could this be causing the issue the other 25% of the time? Any config recommendations or is this a bug with emqx?

@mspoehr
Copy link
Author

mspoehr commented Apr 13, 2020

We've changed our IoT rules to also accept messages on a topic structure separate from the one being subscribed to. This is the resulting config:

bridge.mqtt.aws.address = xxxxxxxxxxxxxx-ats.iot.us-west-2.amazonaws.com:8883
bridge.mqtt.aws.proto_ver = mqttv4
bridge.mqtt.aws.start_type = auto
bridge.mqtt.aws.bridge_mode = true
bridge.mqtt.aws.clientid = someremoteclientid
bridge.mqtt.aws.clean_start = true
bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
bridge.mqtt.aws.receive_mountpoint = aws/
bridge.mqtt.aws.ssl = on
bridge.mqtt.aws.cacertfile = /path/to/AmazonRootCA1.pem
bridge.mqtt.aws.certfile = /path/to/id_rsa.crt
bridge.mqtt.aws.keyfile = /path/to/id_rsa.key
bridge.mqtt.aws.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384
bridge.mqtt.aws.keepalive = 60s
bridge.mqtt.aws.tls_versions = tlsv1.2

With the config in the previous comment I expected a single loopback 100% of the time, but instead got infinite loopback some percentage of the time. With this new config, I don't expect any loopback, ever. I'm still seeing the same issue with infinite loopback. This tells me that the issue does not have anything to do with attempting to send and receive from the same topic structure, as sending to some/topic/structure/to-aws and subscribing to some/topic/structure/cloud should be completely disjoint.

I was able to restart emqx a (seemingly) random number of times to get the issue to go away.

Any thoughts on other config options that could be causing this?

@turtleDeng
Copy link
Member

turtleDeng commented Apr 16, 2020

There is a problem with your configuration, causing the message to be sent in a loop

bridge.mqtt.aws.forwards = cloud/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure

bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1

The emqx bridged messages will be sent to AWS IoT via some/topic/structure/cloud/# topic

You configured again

bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#
bridge.mqtt.aws.subscription.1.qos = 1
`` `
Subscribe to some/topic/structure/cloud/# in AWS IoT, so the message will loop

@mspoehr
Copy link
Author

mspoehr commented Apr 17, 2020

Thanks for the response. Changing the configuration so that there isn't a loop, I still see this exact same issue. Ideally I'd be able to send/receive to the bridge on the same topic structure, but it isn't a deal breaker if this isn't possible.

My config now contains:

bridge.mqtt.aws.forwards = to-aws/#
bridge.mqtt.aws.forward_mountpoint = some/topic/structure
bridge.mqtt.aws.subscription.1.topic = some/topic/structure/cloud/#

Messages should be sent to AWS on some/topic/structure/to-aws, and received from the subscription some/topic/structure/cloud. With this new config, I still see the same issue.

I was able to find some more information while debugging as well:

  • I found that sending several messages in quick succession reliably reproduces this issue.
    • In bash, using mosquitto-clients: for i in $(seq 10); do mosquitto_pub -t to-aws/test -m "{ \"content\": \"$i\" }"; done
    • Sending < 5 messages quickly seems to never reproduce the issue.
    • Sending 5-10 messages quickly seems to only sometimes reproduce the issue.
    • Sending 10+ messages quickly almost always reproduces the issue (and sending 100+ reproduces 100% of the time)
  • The quality of service of published messages appears not to matter.
  • I did a test with mosquitto in place of AWS IoT, and emqx appeared to work just fine with mosquitto.
    • I tried with both a secure and insecure connection from emqx to mosquitto, with the secure connection trying to replicate as closely as possible how we connect to IoT.
    • Sending 1000's of messages when connected to mosquitto as fast as possible did not cause the looping issue.
    • Subscribing with the third client directly to the mosquitto broker correctly receives those 1000 messages, then no more.

Thus, the issue is not looping so much as sending too many messages quickly with AWS IoT causes some sort of bad state.

@qingchuwudi
Copy link
Contributor

Maybe it is retransmission.

          qos1 +-------+                 qos2 +-------+                 qos3
Publisher ---> | Node1 | --Bridge Forward---> | Node2 | --Bridge Forward---> Subscriber
               +-------+                      +-------+
  • qos1: The quality of messages from Publisher to Node1
  • qos2: The quality of messages from Node1 to Node2 with bridge
    It's value is '1'.
    Message will be retransmited when ack package is not received or timeout.
  • qos3: The quality of messages from Node2 to Publisher

@mspoehr
Copy link
Author

mspoehr commented Apr 23, 2020

I had initially thought the same. I'm not sure that we know definitively that qos2 is '1'. Since my latest config has the publish/subscribe topics completely disjoint, the '1' qos for subscribed topics should not effect which QoS published messages are sent out as.

In my bash example above, mosquitto_pub defaults to sending messages with QoS 0. Therefore, I would expect that both qos1 and qos2 is '0'.

I'm not sure what qos3 was during my testing. I would like to say that I tested with both 0 and 1, but I'm not 100% sure about that.

@turtleDeng
Copy link
Member

@saumilsdk
Copy link

I am also having the same problem. I have AWS IOT as broker and emqx bridge is to bridge devices using MQTT-SN protocol to send data to this emqx bridge. The same data comes back on each publish.

I have to have MQTT based devices which are sending data to AWS IOT direectly which should reach to MQTT-SN based devices running behind emqx bridge.

@saumilsdk
Copy link

@mspoehr or @turtleDeng can you please help in resolving looping in case brigde is subscribing same topics as publishing? I am connecting bridge to AWS IOT endpoint.

@mspoehr
Copy link
Author

mspoehr commented May 12, 2020

You can refer to https://docs.emqx.io/broker/latest/en/configuration/configuration.html#zoneexternalupgradeqos

I really don't think this is a QoS issue. This issue occurs when using any combination of QoS values, even with all 0's, which should never cause this.

Can you please help in resolving looping in case brigde is subscribing same topics as publishing?

@saumilsdk I am not sure that this is possible with emqx in its current state. This issue seems like a bug in emqx to me. In my case, I was able to configure my publishing and bridge subscriptions to be completely disjoint, and I still received the same messages looped back forever.

If you're not experiencing the messages being looped back forever, but instead just receiving the same message you publish one time-I would actually expect this behavior.

@saumilsdk
Copy link

saumilsdk commented May 13, 2020

@mspoehr Hi i agree with you if i get the same message twice but here I am stuck with looping forever and ended up restarting server every time. I can find no way out of this issue. Any help will be appreciated. Here is my bridge config. I am using EMQX-SN plugin to act as gateway and EMQX-BRIDGE to bridge the gateway to end AWS IOT broker.

@qingchuwudi and @turtleDeng If you guys can also look into this.

bridge.mqtt.emqx2.start_type = auto

bridge.mqtt.emqx2.address = a3itfXXXX.iot.us-east-1.amazonaws.com:8883

bridge.mqtt.emqx2.proto_ver = mqttv4

bridge.mqtt.emqx2.clientid = bridge_emqx2

bridge.mqtt.emqx2.clean_start = true

bridge.mqtt.emqx2.ssl = on

bridge.mqtt.emqx2.cacertfile = /etc/mqtt/certs/rootCA.pem

bridge.mqtt.emqx2.certfile = /etc/mqtt/certs/cert.crt

bridge.mqtt.emqx2.keyfile = /etc/mqtt/certs/private.key

bridge.mqtt.emqx2.ciphers = ECDHE-ECDSA-AES256-GCM-SHA384,ECDHE-RSA-AES256-GCM-SHA384

PSK-AES128-CBC-SHA,PSK-AES256-CBC-SHA,PSK-3DES-EDE-CBC-SHA,PSK-RC4-SHA

bridge.mqtt.emqx2.keepalive = 60s

bridge.mqtt.emqx2.tls_versions = tlsv1.2,tlsv1.1,tlsv1

bridge.mqtt.emqx2.forwards = #

bridge.mqtt.emqx2.subscription.1.topic = #

bridge.mqtt.emqx2.subscription.1.qos = 1

bridge.mqtt.emqx2.reconnect_interval = 30s

bridge.mqtt.emqx2.retry_interval = 20s

bridge.mqtt.emqx2.max_inflight_size = 32

@mspoehr
Copy link
Author

mspoehr commented May 13, 2020

@ saumilsdk I'm not sure if your use case will work with this, but you could try adding a receive_mountpoint just to see if it helps. In my case, I had:

bridge.mqtt.aws.receive_mountpoint = aws/

^ but this still didn't fix the issue for me. I could see in your case where emqx could loop back infinitely if you are bridging # in both directions with no prefixes on either side. Still, for a "bridge" plugin, it seems like this should be a supported use case. But it seems that it is not.

@saumilsdk
Copy link

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/

@gbunel29
Copy link

Did you have solution for this issue?
I'm also facing this same issue with bridge

@saumilsdk
Copy link

@gbunel29 i have moved from emqx to paho mqtt-sn gateway which doesn't have loopback issue.
@mspoehr FYI

@turtleDeng turtleDeng assigned wwhai and unassigned qingchuwudi Jun 1, 2020
@wwhai
Copy link

wwhai commented Jun 5, 2020

@mspoehr i had tried adding both the mount points but seems looping still happens and topic prefix also keeps getting added on the messages looped. As you know i am not running emqx broker and only emqx-sn and emqx-bridge i am running, what options do we have for these to disable looping?

bridge.mqtt.emqx2.forward_mountpoint = tmp/forward/aws/
bridge.mqtt.emqx2.receive_mountpoint = tmp/receive/aws/

———
It will verb loop when publish topic same as subscribe topic。Suggest you change your topic such:

  • pub:/pub/etc/...
  • sub:/sub/etc/…

Maybe add prefix or suffix will avoid this problem .

@Trance-Paradox
Copy link

This looping error is occurring again. Message ar looping forever when published on same topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants