-
Notifications
You must be signed in to change notification settings - Fork 806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lilygo RTL433 runs fine, then reboots every couple of minutes then hangs... #2092
Comments
Need to see the serial output |
OK...
Then a few minutes later: (note log level reverted to 'Notice')
Then again:
etc. So somehow there is a heap corruption... Note that the last reported stack low water mark was 936 bytes before the first crash and then about 3600 bytes before the subsequent crashes. So, if it is exhausting stack, it must be doing that while trying to decode the current message so it doesn't get reported via the RFtoMQTT routine. Again, it can run fine for hours or even days (in this case it ran fine for 4 days (!) before I start seeing the heap corruption. But once it occurs, then it seems to recur and reboot typically every few minutes for 5-10 or more times before resuming stability. In this case it rebooted about 6 times over the course of about 2.5 hours. Any ideas what may be causing the heap corruption and/or how to troubleshoot? The only thing I can think of is that there is some sensor that reports intermittently (maybe it's only turned on every once in a while) and which consumes excessive stack causing the heap corruption. |
Maybe try to increase again the stack, and if this is still the problem we should check if there is a way to limit the decoder consumption and avoid such crashes. |
That seems though to be a bit "brute force" |
I like your idea of limiting decoder consumption and perhaps logging any time a decoder tries to use more than that amount so that we can know the name of the decoder and how much stack it sought to consume... |
Add It should show exactly where the error occurred |
I did this btw last time I had crashes and it pointed to increasing the stack size but happy to enable it again to see if I get the same error. |
I received another half dozen reboots this morning with
AND
etc. So maybe it's not a decoder issue???? Or at least not like the ones causing #2043 where I got an error saying Any ideas on how to debug this further? |
Oh, If you run a build and upload it to the board, then run the monitor it will convert the backtrace addresses to actual lines of code, so we can pin point the issue. |
I did re-build and re-uploaded (using platformio) with |
That's weird, as it should have given a longer back trace. If you build a different project afterwards, it does break the feature. |
I will try to rebuild again... |
I am not seeing anything in the backtrace other than what I reported above... Meanwhile it ran fine for 2 weeks without any crash or reboot. Then today for about 90 minutes it rebooted every minute before stabilizing. My evidence is as follows:
I just increased the stack size to 15000 to see if that solves the problem -- but unfortunately it's not crashing now so I have to wait until the bad sensor or sensor message comes online again... Stay tuned... |
I am running 2 different Lilygo rtl433 ESP32 devices.
They had been running fine for the last couple of weeks.
Then recently, the one downstairs, started the following behavior.
The upstairs device seems completely stable.
Both devices never seem to get below about 2K free rtl_433 stack space nor do they run out of memory.
It's unlikely to be a power issue as I have a LiIon battery pack backup on the Lilygo device.
More generally, I don't think it's the hardware or firmware per-se as swapping the devices caused the swapped device to start exhibiting the above behavior.
Any ideas on what could be causing this?
The only thing I can think of is that a new device or device reading comes online that is only received at the downstairs location -- and somehow that triggers a bug or even a stack overflow (before it gets reported to the MQTT broker)...
@1technophile @NorthernMan54 have you seen anything like this?
The text was updated successfully, but these errors were encountered: