Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Program stops if alive_data acknowledge fails only once #140

Closed
TekyTeky opened this issue Jun 22, 2023 · 19 comments
Closed

[Bug] Program stops if alive_data acknowledge fails only once #140

TekyTeky opened this issue Jun 22, 2023 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@TekyTeky
Copy link

TekyTeky commented Jun 22, 2023

Describe the bug
While running in Random mode, both trains stopped and no error showed

To reproduce
This has happened previously but is not replicible

Steps:

  1. Move trains to Blocks out of siding
  2. Set control devices to All
  3. Set Random Mode
  4. trains run then stop suddenly

Screenshots/Videos/Logs
See attached zip with logs Layout and screenshot

Stop with no cause.zip

Locations where they stopped, see Blue plate indicating the blue sensor location.
IMG_2511
IMG_2512

@Novakasa
Copy link
Owner

Looking at the logs, it seems that the emergency stop was triggered because the program on the "GreenCityTrain" hub was stopped for some reason. Do you know if that hub was still turned on after the emergency stop? The program could also have been stopped by pressing the button on the hub.

I don't see any error message from the hub in the ble-server log, so I'm not sure what could've caused the program stop by itself. If the batteries were empty, I think I would've seen a different error, but I might need to investigate this.

@Novakasa Novakasa self-assigned this Jun 22, 2023
@Novakasa
Copy link
Owner

Has this happened more than once?

@TekyTeky
Copy link
Author

The train was running normally, and was under a bridge so not near me at all, and was definitely not turned off when I moved it after they had stopped.
The control had changed from All to None… and this has happened a few times over the months at random times.
Maybe batteries were under 7 volts, I know I replaced them last night, but I thought it was before then, as I ran them for quite a while afterwards.

It had a marker error at that same place earlier, but I couldn’t figure out why so I moved the spacings to see if that resolved the issue. This is the new train and has not caused any issues before.
I’ll do some more running over the weekend and see how it goes.

@TekyTeky
Copy link
Author

TekyTeky commented Jun 23, 2023

Also I am running - [v1.0.0-alpha.5-dev] I guess you saw that in the logs... is that ok.
I was going to always keep up to date with recent master.
eg Build #174

@Novakasa Novakasa added the bug Something isn't working label Jun 23, 2023
@Novakasa
Copy link
Owner

It shouldn't be the batteries, as apparently the hubs only turn off at 4.8V, 7V should be fine easily (rechargeables only produce 7.2V at max)

@Novakasa
Copy link
Owner

I'll have to check whether hub program errors are still printed to ble-server log properly

@TekyTeky
Copy link
Author

TekyTeky commented Jun 24, 2023

It just happened again after a long Random Run... I had some emergency stops at the beginning, that was my fault.. the one at the end is the one which stopped the app.
It almost seemed like an impasse, but both had an available route from wheer they stopped.

Logs - Stop2.zip

I normally just add the brickrail.log, but noticed there was onther from this morning too:

brickrail_2023-06-24_11.35.29.log

@Novakasa
Copy link
Owner

Novakasa commented Jun 24, 2023

Again, according to these logs the same hub just stopped the program for some reason. Would be interesting to see the next few logs (if it happens again) in case it's always the same hub

@Novakasa
Copy link
Owner

you can see yourself which hub stopped the program by watching the left panel where the train hubs are listed. After the program stopped the "stop" button instead shows "start" when the program stopped. In your logs it seems it has always been the "GreenCityHub" for now

@Novakasa
Copy link
Owner

Also, after this happens, have you tried turning "control devices" back on again? Normally, brickrail should then just start the program for the hub again, and if every train position is consistent with the virtual layout (which you probably have to fix manually), everything should work normally again.

@TekyTeky
Copy link
Author

TekyTeky commented Jun 24, 2023

you can see yourself which hub stopped the program by watching the left panel where the train hubs are listed. After the program stopped the "stop" button instead shows "start" when the program stopped. In your logs it seems it has always been the "GreenCityHub" for now

Yes, Ok… I’ll check next time.
As I said, this is the new train, maybe I’ll flash the hub again with the FW.
Now you say it is the Green train, the City train entered a Block and paused, when the Green entered its Block it’s then that it stopped. Maybe even in the same place as last time, after a Blue sensor.
I guess the BLE wouldn’t be out of range, as it is the furthest distance and under a bridge.. although that does seem unlikely.

@TekyTeky
Copy link
Author

Also, after this happens, have you tried turning "control devices" back on again?

Yes it starts again, and I think, but not 100% sure, that the Green train is in the correct location to just carry on. They do stop in the same location as the Virtual Layout… eg on a Block.

@TekyTeky
Copy link
Author

TekyTeky commented Jun 24, 2023

I just checked the pic above of the Green train and the back driving wheel had de-railed.
Does the FW have provision to stop the SW if it gets jammed?
Because it is behind the bridge and buildings, I wouldn’t see if it righted itself again when it re-started. I’ll check the track today.

@TekyTeky
Copy link
Author

Ok, it happened again, but with a Byte array error and then disconnected the BLE

I had increased the chroma on both trains as I was getting random Marker errors. So I restarted it and then got this error.
I had also reldownloaded the Green train with the Brickrail SW.
BytaArray-BLEDisconect.zip

@Novakasa
Copy link
Owner

Ok, it happened again, but with a Byte array error and then disconnected the BLE

I had increased the chroma on both trains as I was getting random Marker errors. So I restarted it and then got this error. I had also reldownloaded the Green train with the Brickrail SW. BytaArray-BLEDisconect.zip

This seems to be a different issue, so I opened #141

@Novakasa Novakasa changed the title [Bug] Emergency Stop with no error while runing Random Mode [Bug] Train program randomly stops Jul 7, 2023
@Novakasa
Copy link
Owner

Novakasa commented Jul 7, 2023

I just had this happen on my test layout ble_test_colors.brl

unfortunately I overwrote the ble-server.log out of stupidity (I really need to make it keep more files)

brickrail_2023-07-07_13.00.58.log

It definitely looks like the program just stops, without any errors printed to the console. I worked on the hub programs recently, and error printing definitely still works fine. Maybe we accidentally send a stop program command?

@Novakasa
Copy link
Owner

Novakasa commented Jul 7, 2023

This could be caused by the watchdog implementation. The hub regularly sends alive data (along with battery status) and when it gets no response, it stops the program. This is used to stop the program in case for some reason the connection to PC is lost. If however, the alive data or response is not received properly, there is no retry attempted. Most other communications do retry if something is wrong, so we should also attempt retries with the alive data. If this is indeed the issue, that means my communication robustness protocol was not a waste of time, since the communication actually goes wrong sometimes.

@Novakasa
Copy link
Owner

Novakasa commented Jul 7, 2023

Yes, this is pretty much confirmed, since it does resend the alive data before it stops the program, and in all the logs there is a very tight correlation between alive data (battery status) and program stopped. As a temporary fix, I will disable watchdog program stop. In the future, I should make it retry the alive data a number of times before stopping the program.

Novakasa added a commit that referenced this issue Jul 7, 2023
also add comments to other hub programs to trigger redownload for
all users, since this is only a io_hub change otherwise (#147).

addresses #140
@Novakasa Novakasa changed the title [Bug] Train program randomly stops [Bug] Program stops if alive_data acknowledge fails only once Jul 7, 2023
@Novakasa
Copy link
Owner

Novakasa commented Jul 7, 2023

temp fix is now implemented on master

Novakasa added a commit that referenced this issue Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants