-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CH6178] Tracer/checkpoints (rebase) #1451
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An excellent piece of work and fantastic to have this feature available! We will release it in 0.8.0-rc.2 so please ensure corresponding docs are available. In general, I'd like to see more source code comments describing why we are done - I've added some inline comments for areas that I feel need some explanation.
os_thread_t os_thread_current(); | ||
|
||
typedef struct { | ||
uint8_t reserved; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a size field here for extensibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a question in the original PR as well :)
It's currently only used within a single module and is not exported, so that shouldn't be a concern at the moment. If we later decide to export it, the first reserved becomes a version field as usual.
@@ -15,14 +15,14 @@ AR = $(GCC_ARM_PATH)$(GCC_PREFIX)gcc-ar | |||
# | |||
|
|||
# C compiler flags | |||
CFLAGS += -g3 -gdwarf-2 -Os -mcpu=cortex-m3 -mthumb | |||
CFLAGS += -g3 -gdwarf-2 -Os -mcpu=cortex-m3 -mthumb -fomit-frame-pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you say a few words about the omit-frame-pointer
option. is it only an optimization or does it have a functional purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization is already enabled by -Os
(actually by -O1
, see https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html), but it's better to explicitly specify it.
Disabling this option will lead to increased flash and stack usage (at the very least +16 bytes on each function call without tail-call optimization) and will interfere with stacktracer implemented in this PR. MbedTLS bignum assembly optimizations will also fail to build as R7 is not available and stores the frame pointer.
#include <malloc.h> | ||
#include "timer_hal.h" | ||
|
||
extern "C" void vTaskGetStackInfo( TaskHandle_t pxTask, void** stack_ptr, void** start_stack_ptr, void** end_stack_ptr ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be pushed to a FreeRTOS header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
return 0; | ||
} | ||
|
||
os_result_t os_thread_dump(os_thread_t thread, os_thread_dump_callback_t callback, void* ptr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document that the callback is executed in a critical section and should ensure it completes independently of the progress of other threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
TracerService
also only calls it with interrupts disabled.
|
||
int count = 0; | ||
|
||
for (uint32_t* sp = (uint32_t*)info->stack_start; sp <= (uint32_t*)info->stack_end; sp++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
black magic! 🤘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat documented blackmagic :)
} else { | ||
// Insert new at the end | ||
if (freeSpace() < (sizeof(ThreadEntry) + size(chkpt) + strnlen(info->name, maxThreadNameLength) + 1)) { | ||
return TRACER_ERROR_NO_SPACE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use a circular buffer so that the most recent entries are preserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea, however in theory we should not run out of space easily. There is at least 1k (~700 on Electron) bytes in system retained RAM and we do not have that many threads (especially on Electron).
We do not generate full stacktraces on every call (only when needed, e.g. using USB request or in hardfault/watchdog ISR) and every thread entry only has a header struct (8 bytes) + its name (up to 16 bytes) + a checkpoint stored (4 bytes).
@@ -26,6 +26,8 @@ CFLAGS += -DINCLUDE_PLATFORM=1 | |||
# platforms.h | |||
ifeq ($(PLATFORM_ID),3) | |||
INCLUDE_DIRS += $(PROJECT_ROOT)/platform/shared/inc | |||
INCLUDE_DIRS += $(PROJECT_ROOT)/platform/MCU/gcc/inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unexpected - could you explain why the gcc platform is brought in for all user apps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for GCC platform: ifeq ($(PLATFORM_ID),3)
@@ -7,6 +7,6 @@ TARGET_TYPE = a | |||
|
|||
BUILD_PATH_EXT=$(COMMUNICATION_BUILD_PATH_EXT) | |||
|
|||
DEPENDENCIES = hal dynalib services wiring crypto | |||
DEPENDENCIES = hal dynalib services wiring crypto platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine this is because of the traces in the logs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. platform_tracer.h
mainly.
@@ -1,8 +1,8 @@ | |||
|
|||
system_part3_start = 0x8060000; | |||
|
|||
system_part3_ram_end = 0x2001D800 /* 0x20200000-10K */; | |||
system_part3_ram_start = 0x2001c000 /* end of SRAM - 16K */; | |||
system_part3_ram_end = 0x2001D800 - 1K /* 0x20200000-10K-1K */; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the 1K reserved for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the time of writing system-part3 RAM was overflowing, this change might no longer be needed.
|
||
int tracer_save_checkpoint__(tracer_checkpoint_t* chkpt, uint32_t flags, void* reserved) { | ||
if (callbacks2.tracer_save_checkpoint) { | ||
return callbacks2.tracer_save_checkpoint(chkpt, flags, reserved); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain please why we need two distinct callback stores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One for system-part1, the other for system-part3. Ideally these should be moved into an .inc
and built as part of module-specific source files or I guess both should be enabled simultaneously and built as weak
, as otherwise these will probably cause issues with non-clean builds. 👍
…ed_system section size
…hen built by C (as opposed to C++) compiler. Replace const variables like OS_THREAD_INVALID_HANDLE in concurrent_hal.h with defines when being built by C compiler
…d of passing an integer argument to __builtin_return_address
…opy of previous trace data
8e40494
to
95628c0
Compare
submission notes
Problem
This is a rebase of #1369 on develop with renaming according to [CH8404] and some additional backup SRAM overflow checks.
Solution
When a device hard faults, crashes, freezes or is otherwise misbehaving, it is presently difficult to know what the last action the device took is and whether the fault lies in application code, system code, peripheral code etc..
A checkpoints API would allow application and system firmware to record their current execution progress and provide an indication of where execution halted prior to the crash. This information is then published to the cloud on next successful connection.
It is also possible to get stack trace of all the threads using a naive stack unwinder, that just scans through the thread stack, looking for flash addresses (within system and user part bounds) and checks if there is a branching instruction at that location.
It would be useful to store per-thread stack traces in addition to checkpoint info whenever the device enters a hardfault, a panic state, or on demand. This will provide a better overview of the system state and help understand the cause of the crash or deadlock for difficult to reproduce issues.
System code may use the following macros defined in
tracer_service.h
:TRACER_CHECKPOINT()
- regular instruction address type checkpointTRACER_PANIC_CHECKPOINT()
- special instruction address type checkpoint which should only be called from a panic handler, which will prevent further modification of diagnostic dataTRACER_CRASH_CHECKPOINT(pc)
- special instruction address type checkpoint which should only be called from a crash (hardfault) handler, which will prevent further modification of diagnostic data. An instruction address needs to be manually passed as an argument.TRACER_UPDATE()
- forces a full system state update (including stacktraces).All
LOG
statements, except forLOG_DUMP
, make a call toTRACER_CHECKPOINT()
.Both hardfault and panic handlers make a call to their respective macros:
TRACER_CRASH_CHECKPOINT()
,TRACER_PANIC_CHECKPOINT()
Application code may use the following macros defined in
spark_wiring_tracer.h
:CHECKPOINT()
- a standard variant ofCHECKPOINT()
macro. Implementation depends on a preprocessor macroTRACER_ELF_AVAILABLE
.CHECKPOINT()
macro internally callsTRACER_CHECKPOINT()
, making it an instruction address type checkpointCHECKPOINT()
macro internally makes a call toTRACER_TEXT_CHECKPOINT()
, which saves the location of the checkpoint in textual format:__FILENAME__:__LINE__
CHECKPOINT(text)
- a specialized variant ofCHECKPOINT()
macro, which forces a textual checkpoint with the text passed as an argumentAll
Logger
class calls, except forprint
,printf
,write
anddump
, internally include a call to_LOG_CHECKPOINT()
macro defined in spark_wiring_diagnostic.h, which is only enabled whenTRACER_ELF_AVAILABLE == 1
. It acquires an address of the calling function and uses that as the checkpoint instruction address.Steps to Test
app/checkpoint
(see README.md)Example App
app/checkpoint
References
Completeness