[Bug] Program stops if alive_data acknowledge fails only once #140

TekyTeky · 2023-06-22T07:09:41Z

Describe the bug
While running in Random mode, both trains stopped and no error showed

To reproduce
This has happened previously but is not replicible

Steps:

Move trains to Blocks out of siding
Set control devices to All
Set Random Mode
trains run then stop suddenly

Screenshots/Videos/Logs
See attached zip with logs Layout and screenshot

Stop with no cause.zip

Locations where they stopped, see Blue plate indicating the blue sensor location.

Novakasa · 2023-06-22T08:17:06Z

Looking at the logs, it seems that the emergency stop was triggered because the program on the "GreenCityTrain" hub was stopped for some reason. Do you know if that hub was still turned on after the emergency stop? The program could also have been stopped by pressing the button on the hub.

I don't see any error message from the hub in the ble-server log, so I'm not sure what could've caused the program stop by itself. If the batteries were empty, I think I would've seen a different error, but I might need to investigate this.

Novakasa · 2023-06-22T10:20:14Z

Has this happened more than once?

TekyTeky · 2023-06-22T18:10:07Z

The train was running normally, and was under a bridge so not near me at all, and was definitely not turned off when I moved it after they had stopped.
The control had changed from All to None… and this has happened a few times over the months at random times.
Maybe batteries were under 7 volts, I know I replaced them last night, but I thought it was before then, as I ran them for quite a while afterwards.

It had a marker error at that same place earlier, but I couldn’t figure out why so I moved the spacings to see if that resolved the issue. This is the new train and has not caused any issues before.
I’ll do some more running over the weekend and see how it goes.

TekyTeky · 2023-06-23T00:11:16Z

Also I am running - [v1.0.0-alpha.5-dev] I guess you saw that in the logs... is that ok.
I was going to always keep up to date with recent master.
eg Build #174

Novakasa · 2023-06-23T08:28:54Z

It shouldn't be the batteries, as apparently the hubs only turn off at 4.8V, 7V should be fine easily (rechargeables only produce 7.2V at max)

Novakasa · 2023-06-23T08:29:46Z

I'll have to check whether hub program errors are still printed to ble-server log properly

TekyTeky · 2023-06-24T02:06:46Z

It just happened again after a long Random Run... I had some emergency stops at the beginning, that was my fault.. the one at the end is the one which stopped the app.
It almost seemed like an impasse, but both had an available route from wheer they stopped.

Logs - Stop2.zip

I normally just add the brickrail.log, but noticed there was onther from this morning too:

brickrail_2023-06-24_11.35.29.log

Novakasa · 2023-06-24T08:23:03Z

Again, according to these logs the same hub just stopped the program for some reason. Would be interesting to see the next few logs (if it happens again) in case it's always the same hub

Novakasa · 2023-06-24T08:25:36Z

you can see yourself which hub stopped the program by watching the left panel where the train hubs are listed. After the program stopped the "stop" button instead shows "start" when the program stopped. In your logs it seems it has always been the "GreenCityHub" for now

Novakasa · 2023-06-24T08:36:16Z

Also, after this happens, have you tried turning "control devices" back on again? Normally, brickrail should then just start the program for the hub again, and if every train position is consistent with the virtual layout (which you probably have to fix manually), everything should work normally again.

TekyTeky · 2023-06-24T20:00:55Z

you can see yourself which hub stopped the program by watching the left panel where the train hubs are listed. After the program stopped the "stop" button instead shows "start" when the program stopped. In your logs it seems it has always been the "GreenCityHub" for now

Yes, Ok… I’ll check next time.
As I said, this is the new train, maybe I’ll flash the hub again with the FW.
Now you say it is the Green train, the City train entered a Block and paused, when the Green entered its Block it’s then that it stopped. Maybe even in the same place as last time, after a Blue sensor.
I guess the BLE wouldn’t be out of range, as it is the furthest distance and under a bridge.. although that does seem unlikely.

TekyTeky · 2023-06-24T20:05:07Z

Also, after this happens, have you tried turning "control devices" back on again?

Yes it starts again, and I think, but not 100% sure, that the Green train is in the correct location to just carry on. They do stop in the same location as the Virtual Layout… eg on a Block.

TekyTeky · 2023-06-24T20:16:10Z

I just checked the pic above of the Green train and the back driving wheel had de-railed.
Does the FW have provision to stop the SW if it gets jammed?
Because it is behind the bridge and buildings, I wouldn’t see if it righted itself again when it re-started. I’ll check the track today.

TekyTeky · 2023-06-24T22:59:22Z

Ok, it happened again, but with a Byte array error and then disconnected the BLE

I had increased the chroma on both trains as I was getting random Marker errors. So I restarted it and then got this error.
I had also reldownloaded the Green train with the Brickrail SW.
BytaArray-BLEDisconect.zip

Novakasa · 2023-06-28T17:09:54Z

Ok, it happened again, but with a Byte array error and then disconnected the BLE

I had increased the chroma on both trains as I was getting random Marker errors. So I restarted it and then got this error. I had also reldownloaded the Green train with the Brickrail SW. BytaArray-BLEDisconect.zip

This seems to be a different issue, so I opened #141

Novakasa · 2023-07-07T11:01:20Z

I just had this happen on my test layout ble_test_colors.brl

unfortunately I overwrote the ble-server.log out of stupidity (I really need to make it keep more files)

brickrail_2023-07-07_13.00.58.log

It definitely looks like the program just stops, without any errors printed to the console. I worked on the hub programs recently, and error printing definitely still works fine. Maybe we accidentally send a stop program command?

Novakasa · 2023-07-07T11:29:32Z

This could be caused by the watchdog implementation. The hub regularly sends alive data (along with battery status) and when it gets no response, it stops the program. This is used to stop the program in case for some reason the connection to PC is lost. If however, the alive data or response is not received properly, there is no retry attempted. Most other communications do retry if something is wrong, so we should also attempt retries with the alive data. If this is indeed the issue, that means my communication robustness protocol was not a waste of time, since the communication actually goes wrong sometimes.

Novakasa · 2023-07-07T11:38:46Z

Yes, this is pretty much confirmed, since it does resend the alive data before it stops the program, and in all the logs there is a very tight correlation between alive data (battery status) and program stopped. As a temporary fix, I will disable watchdog program stop. In the future, I should make it retry the alive data a number of times before stopping the program.

also add comments to other hub programs to trigger redownload for all users, since this is only a io_hub change otherwise (#147). addresses #140

Novakasa · 2023-07-07T11:43:15Z

temp fix is now implemented on master

fixes #140

Novakasa self-assigned this Jun 22, 2023

Novakasa added the bug Something isn't working label Jun 23, 2023

Novakasa mentioned this issue Jun 28, 2023

smart_train.py running out of route #141

Open

Novakasa changed the title ~~[Bug] Emergency Stop with no error while runing Random Mode~~ [Bug] Train program randomly stops Jul 7, 2023

Novakasa added a commit that referenced this issue Jul 7, 2023

temp fix for program randomly stopping

bbdd9d8

also add comments to other hub programs to trigger redownload for all users, since this is only a io_hub change otherwise (#147). addresses #140

Novakasa changed the title ~~[Bug] Train program randomly stops~~ [Bug] Program stops if alive_data acknowledge fails only once Jul 7, 2023

Novakasa closed this as completed in 2c70c00 Aug 10, 2023

Novakasa added a commit that referenced this issue Aug 10, 2023

retry 5 times for alive data acknowledge

81ed669

fixes #140

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Program stops if alive_data acknowledge fails only once #140

[Bug] Program stops if alive_data acknowledge fails only once #140

TekyTeky commented Jun 22, 2023 •

edited

Loading

Novakasa commented Jun 22, 2023

Novakasa commented Jun 22, 2023

TekyTeky commented Jun 22, 2023

TekyTeky commented Jun 23, 2023 •

edited

Loading

Novakasa commented Jun 23, 2023

Novakasa commented Jun 23, 2023

TekyTeky commented Jun 24, 2023 •

edited

Loading

Novakasa commented Jun 24, 2023 •

edited

Loading

Novakasa commented Jun 24, 2023

Novakasa commented Jun 24, 2023

TekyTeky commented Jun 24, 2023 •

edited

Loading

TekyTeky commented Jun 24, 2023

TekyTeky commented Jun 24, 2023 •

edited

Loading

TekyTeky commented Jun 24, 2023

Novakasa commented Jun 28, 2023

Novakasa commented Jul 7, 2023 •

edited

Loading

Novakasa commented Jul 7, 2023

Novakasa commented Jul 7, 2023

Novakasa commented Jul 7, 2023

[Bug] Program stops if alive_data acknowledge fails only once #140

[Bug] Program stops if alive_data acknowledge fails only once #140

Comments

TekyTeky commented Jun 22, 2023 • edited Loading

Novakasa commented Jun 22, 2023

Novakasa commented Jun 22, 2023

TekyTeky commented Jun 22, 2023

TekyTeky commented Jun 23, 2023 • edited Loading

Novakasa commented Jun 23, 2023

Novakasa commented Jun 23, 2023

TekyTeky commented Jun 24, 2023 • edited Loading

Novakasa commented Jun 24, 2023 • edited Loading

Novakasa commented Jun 24, 2023

Novakasa commented Jun 24, 2023

TekyTeky commented Jun 24, 2023 • edited Loading

TekyTeky commented Jun 24, 2023

TekyTeky commented Jun 24, 2023 • edited Loading

TekyTeky commented Jun 24, 2023

Novakasa commented Jun 28, 2023

Novakasa commented Jul 7, 2023 • edited Loading

Novakasa commented Jul 7, 2023

Novakasa commented Jul 7, 2023

Novakasa commented Jul 7, 2023

TekyTeky commented Jun 22, 2023 •

edited

Loading

TekyTeky commented Jun 23, 2023 •

edited

Loading

TekyTeky commented Jun 24, 2023 •

edited

Loading

Novakasa commented Jun 24, 2023 •

edited

Loading

TekyTeky commented Jun 24, 2023 •

edited

Loading

TekyTeky commented Jun 24, 2023 •

edited

Loading

Novakasa commented Jul 7, 2023 •

edited

Loading