-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Program stops if alive_data acknowledge fails only once #140
Comments
Looking at the logs, it seems that the emergency stop was triggered because the program on the "GreenCityTrain" hub was stopped for some reason. Do you know if that hub was still turned on after the emergency stop? The program could also have been stopped by pressing the button on the hub. I don't see any error message from the hub in the ble-server log, so I'm not sure what could've caused the program stop by itself. If the batteries were empty, I think I would've seen a different error, but I might need to investigate this. |
Has this happened more than once? |
The train was running normally, and was under a bridge so not near me at all, and was definitely not turned off when I moved it after they had stopped. It had a marker error at that same place earlier, but I couldn’t figure out why so I moved the spacings to see if that resolved the issue. This is the new train and has not caused any issues before. |
Also I am running - [v1.0.0-alpha.5-dev] I guess you saw that in the logs... is that ok. |
It shouldn't be the batteries, as apparently the hubs only turn off at 4.8V, 7V should be fine easily (rechargeables only produce 7.2V at max) |
I'll have to check whether hub program errors are still printed to ble-server log properly |
It just happened again after a long Random Run... I had some emergency stops at the beginning, that was my fault.. the one at the end is the one which stopped the app. I normally just add the brickrail.log, but noticed there was onther from this morning too: |
Again, according to these logs the same hub just stopped the program for some reason. Would be interesting to see the next few logs (if it happens again) in case it's always the same hub |
you can see yourself which hub stopped the program by watching the left panel where the train hubs are listed. After the program stopped the "stop" button instead shows "start" when the program stopped. In your logs it seems it has always been the "GreenCityHub" for now |
Also, after this happens, have you tried turning "control devices" back on again? Normally, brickrail should then just start the program for the hub again, and if every train position is consistent with the virtual layout (which you probably have to fix manually), everything should work normally again. |
Yes, Ok… I’ll check next time. |
Yes it starts again, and I think, but not 100% sure, that the Green train is in the correct location to just carry on. They do stop in the same location as the Virtual Layout… eg on a Block. |
I just checked the pic above of the Green train and the back driving wheel had de-railed. |
Ok, it happened again, but with a Byte array error and then disconnected the BLE I had increased the chroma on both trains as I was getting random Marker errors. So I restarted it and then got this error. |
This seems to be a different issue, so I opened #141 |
I just had this happen on my test layout unfortunately I overwrote the ble-server.log out of stupidity (I really need to make it keep more files) brickrail_2023-07-07_13.00.58.log It definitely looks like the program just stops, without any errors printed to the console. I worked on the hub programs recently, and error printing definitely still works fine. Maybe we accidentally send a stop program command? |
This could be caused by the watchdog implementation. The hub regularly sends alive data (along with battery status) and when it gets no response, it stops the program. This is used to stop the program in case for some reason the connection to PC is lost. If however, the alive data or response is not received properly, there is no retry attempted. Most other communications do retry if something is wrong, so we should also attempt retries with the alive data. If this is indeed the issue, that means my communication robustness protocol was not a waste of time, since the communication actually goes wrong sometimes. |
Yes, this is pretty much confirmed, since it does resend the alive data before it stops the program, and in all the logs there is a very tight correlation between alive data (battery status) and program stopped. As a temporary fix, I will disable watchdog program stop. In the future, I should make it retry the alive data a number of times before stopping the program. |
temp fix is now implemented on master |
Describe the bug
While running in Random mode, both trains stopped and no error showed
To reproduce
This has happened previously but is not replicible
Steps:
Screenshots/Videos/Logs
See attached zip with logs Layout and screenshot
Stop with no cause.zip
Locations where they stopped, see Blue plate indicating the blue sensor location.
The text was updated successfully, but these errors were encountered: