Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CH6178] Checkpoints/diagnostics #1369

Closed
wants to merge 20 commits into from
Closed

Conversation

avtolstoy
Copy link
Member

@avtolstoy avtolstoy commented Aug 17, 2017

Superseded by #1451

Problem

When a device hard faults, crashes, freezes or is otherwise misbehaving, it is presently difficult to know what the last action the device took is and whether the fault lies in application code, system code, peripheral code etc..

A checkpoints API would allow application and system firmware to record their current execution progress and provide an indication of where execution halted prior to the crash. This information is then published to the cloud on next successful connection.

It is also possible to get stack trace of all the threads using a naive stack unwinder, that just scans through the thread stack, looking for flash addresses (within system and user part bounds) and checks if there is a branching instruction at that location.

It would be useful to store per-thread stack traces in addition to checkpoint info whenever the device enters a hardfault, a panic state, or on demand. This will provide a better overview of the system state and help understand the cause of the crash or deadlock for difficult to reproduce issues.

Solution

System code may use the following macros defined in diagnostic.h:

  • DIAGNOSTIC_CHECKPOINT() - regular instruction address type checkpoint
  • DIAGNOSTIC_PANIC_CHECKPOINT() - special instruction address type checkpoint which should only be called from a panic handler, which will prevent further modification of diagnostic data
  • DIAGNOSTIC_CRASH_CHECKPOINT(pc) - special instruction address type checkpoint which should only be called from a crash (hardfault) handler, which will prevent further modification of diagnostic data. An instruction address needs to be manually passed as an argument.
  • DIAGNOSTIC_UPDATE() - forces a full system state update (including stacktraces).

All LOG statements, except for LOG_DUMP, make a call to DIAGNOSTIC_CHECKPOINT().
Both hardfault and panic handlers make a call to their respective macros: DIAGNOSTIC_CRASH_CHECKPOINT(), DIAGNOSTIC_PANIC_CHECKPOINT()

Application code may use the following macros defined in spark_wiring_diagnostic.h:

  • CHECKPOINT() - a standard variant of CHECKPOINT() macro. Implementation depends on a preprocessor macro DIAGNOSTIC_ELF_AVAILABLE.
    • When defined and equal to 1, CHECKPOINT() macro internally calls DIAGNOSTIC_CHECKPOINT(), making it an instruction address type checkpoint
    • When undefined or set to 0, CHECKPOINT() macro internally makes a call to DIAGNOSTIC_TEXT_CHECKPOINT(), which saves the location of the checkpoint in textual format: __FILENAME__:__LINE__
  • CHECKPOINT(text) - a specialized variant of CHECKPOINT() macro, which forces a textual checkpoint with the text passed as an argument

All Logger class calls, except for print, printf, write and dump, internally include a call to _LOG_CHECKPOINT() macro defined in spark_wiring_diagnostic.h, which is only enabled when DIAGNOSTIC_ELF_AVAILABLE == 1. It acquires an address of the calling function and uses that as the checkpoint instruction address.

See CH6178 for implementation details.

Steps to Test

  • unit tests
  • app/checkpoint (see README.md)

Example App

  • app/checkpoint

References

  • CH6178

Completeness

  • User is totes amazing for contributing!
  • Contributor has signed CLA (Info here)
  • Problem and Solution clearly stated
  • Run unit/integration/application tests on device
  • Added documentation (required)
  • Added to CHANGELOG.md after merging (add links to docs and issues)

Feature

  • [PR #1369] [Photon/P1/Electron] Added Checkpoints/diagnostics

…hen built by C (as opposed to C++) compiler. Replace const variables like OS_THREAD_INVALID_HANDLE in concurrent_hal.h with defines when being built by C compiler
…d of passing an integer argument to __builtin_return_address
@avtolstoy avtolstoy added this to the 0.8.0 milestone Aug 17, 2017
void* stack;
void* stack_start;
void* stack_end;
} os_thread_dump_info_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we wanted to expand this structure later while retaining backwards compatibility, what would we need to add/change now? A reserved void * at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently only used within a single module and is not exported, so that shouldn't be a concern at the moment. If we later decide to export it, the first reserved becomes a version field as usual.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out @avtolstoy, sounds good.

@technobly
Copy link
Member

This great feature is also going to need some docs to ensure users understand how to use it.

@m-mcgowan
Copy link
Contributor

Andrey, please fix up the conflicts so we can rebase for 0.8.0-rc.2 release.

@m-mcgowan m-mcgowan modified the milestones: 0.8.0, 0.8.0-rc.2 Dec 13, 2017
@avtolstoy
Copy link
Member Author

We also need to rename all the mentions of diagnostic to trace/tracing as originally mentioned in [CH8404] to avoid the confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants