Skip to content
This repository has been archived by the owner on Sep 27, 2021. It is now read-only.

Investigate hard faults #128

Closed
tygamvrelis opened this issue Nov 15, 2018 · 2 comments
Closed

Investigate hard faults #128

tygamvrelis opened this issue Nov 15, 2018 · 2 comments

Comments

@tygamvrelis
Copy link
Member

tygamvrelis commented Nov 15, 2018

Describe the bug
The program will occasionally run into a hard fault if the IMU is not connected.

To Reproduce
Steps to reproduce the behavior:

  1. Program the Robot_F4 master program into a F4 board
  2. Connect all peripherals except the IMU
  3. Execute the program and see the hard fault appear after some time

Expected behavior
All parts of the program should continue functioning as usual. That is, the IMU task should continue running at 500 Hz (the data collected during this time should be NAN or whatever the previous data was—no strict requirements for this yet), and the IMU-independent tasks should continue running as though nothing was any different. There certainly should not be any hard faults.

Additional Context
It is possible that these hard faults could be related to something within the I2C hardware not liking us making calls with nothing connected. After all, the I2C hardware already has silicon bugs related to the I2C flags...
On the other hand, @rfairley used a barebones F4 dev board with no peripherals this past week for testing, and did not run into any hard faults. This suggests an issue related to #126

@tygamvrelis tygamvrelis changed the title IMU routines need to be investigated for robustness Investigate hard faults Nov 15, 2018
@tygamvrelis
Copy link
Member Author

tygamvrelis commented Nov 17, 2018

After working on this for a few hours today on a F4 dev board with no peripherals connected, I have made a few key discoveries:

  1. The hard faults go away when we turn off the optimizing compiler. It is a mistake to have it on in the first place, as our code has not been carefully crafted to work properly with optimization (issue Maintenance Tracker #121 tracks the fact that we need to add volatile in many places, so that's a good start). When the C and C++ compilers use -Og (optimize for debug), as they should, there are no hard faults.

  2. At some point while I was debugging, there was a runtime assertion failure in the FreeRTOS kernel related to mutexes in the Tx/Rx tasks. If I increased the xSemaphoreTake block time to the osWaitForever, it fixed the issue. This whole thing was ultimately due to the fact that the success of taking the mutex was not being checked. Consequently, there were cases where it was being given by a thread which had not taken it in the first place, which lead to the runtime assertion failure

    • Note: this was only arising immediately after I started running the code on the F4 dev board. This is because this dev board requires uart2 to be used for PC comm instead of uart5, and I forgot to change which thread was being woken in one of the callbacks, thus causing the TX thread to be woken upon the motors being written to. Once I fixed that, this issue no longer arose, by design (priorities, etc.), but it is still good to be aware of and fix in the future

Ultimately the root cause for this had to do with mistakenly using the optimizing compiler. This will be fixed in the next PR. Nevertheless, it has raised to our attention a potential issue in the Tx and Rx threads regarding the mutex, so it is good to have made this mistake.

@tygamvrelis
Copy link
Member Author

Fixed in #134 and #109

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant