Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Google IoT client's memory limiter conflicts with ESP-IDF heap corruption detector (CA-102) #18

Open
DaStoned opened this issue Nov 18, 2020 · 6 comments

Comments

@DaStoned
Copy link

Hi!

I'm using the excellent Google IoT Core client port from https://github.com/espressif/esp-google-iot. The CMakeLists.txt in this project enables the memory limiter (-DIOTC_MEMORY_LIMITER_ENABLED).

This option seems to conflict rather badly with ESP-IDF v 4.1 heap corruption detection (enabled by HEAP_POISONING_LIGHT or HEAP_POISONING_COMPREHENSIVE). When both the limiter and detector are enabled at the same time (as is the default), then under some circumstances freeing allocated heap memory triggers ESP-IDF heap corruption detector (which reboots the system).

I see that when memory is free()-d by the mbedtls library (e.g. when a TLS session dies due to an error) it fails with this message:

D (18726) mbedtls: ssl_tls.c:6336 => handshake wrapup                                                                                                                                                                                                                         [439/134558]
D (18726) mbedtls: ssl_tls.c:6309 => handshake wrapup: final free
CORRUPT HEAP: Bad head at 0x3ffbdcc0. Expected 0xabba1234 got 0x3ffbdcf4
abort() was called at PC 0x40082aca on core 0
0x40082aca: lock_acquire_generic at /home/tarmo/myproject/lib/src/esp-idf/components/newlib/locks.c:143

ELF file SHA256: 3ed885ca66e80d50
Backtrace: 0x40086dbd:0x3ffd1210 0x40087151:0x3ffd1230 0x40082aca:0x3ffd1250 0x40082bed:0x3ffd1280 0x4019d901:0x3ffd12a0 0x40194e75:0x3ffd1560 0x40194d51:0x3ffd15b0 0x4008c42b:0x3ffd15e0 0x4008238e:0x3ffd1600 0x4008f3b9:0x3ffd1620 0x400e934d:0x3ffd1640 0x400e5355:0x3ffd1660 0x400e4
1b0:0x3ffd1680 0x40137c2a:0x3ffd16a0 0x4012c497:0x3ffd16c0 0x40130b8c:0x3ffd16e0 0x4012ef59:0x3ffd1700 0x4012f16d:0x3ffd1720 0x40123a5c:0x3ffd1740 0x40123a9d:0x3ffd1760 0x40123b70:0x3ffd1780 0x4012976e:0x3ffd17a0 0x40123979:0x3ffd17c0 0x401239cb:0x3ffd17e0 0x4011d649:0x3ffd1800 0x4
011ce79:0x3ffd1820 0x4011d0db:0x3ffd1840 0x4011d1ab:0x3ffd1870 0x4017925b:0x3ffd18a0 0x401b26a1:0x3ffd18c0 0x4011ef6f:0x3ffd18e0 0x4011f1b5:0x3ffd1910 0x400ea356:0x3ffd1930 0x400ea4d4:0x3ffd1950 0x400ea7a7:0x3ffd1980 0x400e21d5:0x3ffd19b0 0x400e22b3:0x3ffd1a40 0x400e2326:0x3ffd1a80
 0x40088399:0x3ffd1ab0
0x40086dbd: invoke_abort at /home/tarmo/myproject/lib/src/esp-idf/components/esp32/panic.c:157
0x40087151: abort at /home/tarmo/myproject/lib/src/esp-idf/components/esp32/panic.c:174
0x40082aca: lock_acquire_generic at /home/tarmo/myproject/lib/src/esp-idf/components/newlib/locks.c:143
0x40082bed: _lock_acquire_recursive at /home/tarmo/myproject/lib/src/esp-idf/components/newlib/locks.c:171
0x4019d901: _vfiprintf_r at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:853 (discriminator 2)
0x40194e75: fiprintf at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/fiprintf.c:48
0x40194d51: __assert_func at /builds/idf/crosstool-NG/.build/xtensa-esp32-elf/src/newlib/newlib/libc/stdlib/assert.c:58 (discriminator 8)
0x4008c42b: multi_heap_free at /home/tarmo/myproject/lib/src/esp-idf/components/heap/multi_heap_poisoning.c:266 (discriminator 1)
0x4008238e: heap_caps_free at /home/tarmo/myproject/lib/src/esp-idf/components/heap/heap_caps.c:272
0x4008f3b9: free at /home/tarmo/myproject/lib/src/esp-idf/components/newlib/heap.c:47
0x400e934d: iotc_bsp_mem_free at /home/tarmo/myproject/src/fv_comms/build/../components/gcp_iot_sdk/port/src/iotc_bsp_mem_posix.c:28
0x400e5355: __iotc_free at /home/tarmo/myproject/src/fv_comms/build/../components/gcp_iot_sdk/iot-device-sdk-embedded-c/src/libiotc/memory/iotc_allocator.c:39
0x400e41b0: iotc_memory_limiter_free at /home/tarmo/myproject/src/fv_comms/build/../components/gcp_iot_sdk/iot-device-sdk-embedded-c/src/libiotc/debug_extensions/memory_limiter/iotc_memory_limiter.c:341
0x40137c2a: mbedtls_free at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/platform.c:71
0x4012c497: mbedtls_mpi_free at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/bignum.c:111
0x40130b8c: mbedtls_ecp_point_free at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ecp.c:592
0x4012ef59: ecdh_free_internal at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:233
0x4012f16d: mbedtls_ecdh_free at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:265
0x40123a5c: mbedtls_ssl_handshake_free at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_tls.c:8848
0x40123a9d: ssl_handshake_wrapup_free_hs_transform at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_tls.c:6314
0x40123b70: mbedtls_ssl_handshake_wrapup at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_tls.c:6387
0x4012976e: mbedtls_ssl_handshake_client_step at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_cli.c:3626
0x40123979: mbedtls_ssl_handshake_step at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_tls.c:8064
0x401239cb: mbedtls_ssl_handshake at /home/tarmo/myproject/lib/src/esp-idf/components/mbedtls/mbedtls/library/ssl_tls.c:8088
0x4011d649: esp_mbedtls_handshake at /home/tarmo/myproject/lib/src/esp-idf/components/esp-tls/esp_tls_mbedtls.c:107
0x4011ce79: esp_tls_handshake at /home/tarmo/myproject/lib/src/esp-idf/components/esp-tls/esp_tls.c:78
0x4011d0db: esp_tls_low_level_conn at /home/tarmo/myproject/lib/src/esp-idf/components/esp-tls/esp_tls.c:295 (discriminator 6)
0x4011d1ab: esp_tls_conn_new_sync at /home/tarmo/myproject/lib/src/esp-idf/components/esp-tls/esp_tls.c:346
0x4017925b: ssl_connect at /home/tarmo/myproject/lib/src/esp-idf/components/tcp_transport/transport_ssl.c:74
0x401b26a1: esp_transport_connect at /home/tarmo/myproject/lib/src/esp-idf/components/tcp_transport/transport.c:165
0x4011ef6f: esp_http_client_connect at /home/tarmo/myproject/lib/src/esp-idf/components/esp_http_client/esp_http_client.c:1023
0x4011f1b5: esp_http_client_open at /home/tarmo/myproject/lib/src/esp-idf/components/esp_http_client/esp_http_client.c:1179
0x400ea356: _http_connect at /home/tarmo/myproject/lib/src/esp-idf/components/esp_https_ota/src/esp_https_ota.c:98
0x400ea4d4: esp_https_ota_begin at /home/tarmo/myproject/lib/src/esp-idf/components/esp_https_ota/src/esp_https_ota.c:173
0x400ea7a7: esp_https_ota at /home/tarmo/myproject/lib/src/esp-idf/components/esp_https_ota/src/esp_https_ota.c:393
0x400e21d5: fvcomms::OtaHub::updateApp(char const*) at /home/tarmo/myproject/src/fv_comms/build/../main/OtaHub.cpp:177
0x400e22b3: fvcomms::OtaHub::run(bool const&) at /home/tarmo/myproject/src/fv_comms/build/../main/OtaHub.cpp:94 (discriminator 5)
0x400e2326: operator() at /home/tarmo/myproject/src/fv_comms/build/../main/OtaHub.cpp:36
 (inlined by) _FUN at /home/tarmo/myproject/src/fv_comms/build/../main/OtaHub.cpp:38
0x40088399: vPortTaskWrapper at /home/tarmo/myproject/lib/src/esp-idf/components/freertos/port.c:143

My hypothesis is that multi_heap_free() (on multi_heap_poisoning.c:266) receives a pointer to a "heap poison" header structure generated by iotc_memory_limiter.c instead of its own poison. Since IOTC's header doesn't start with the magic 0xabba1234, it declares heap corrupt.

I would speculate that maybe the sequence of two free()-s would have to be carried out in reverse to the sequence of malloc()-s.

Testing indicates that disabling IOTC's memory limiter fixes the problem. I humbly suggest disabling it by default in https://github.com/espressif/esp-google-iot

@github-actions github-actions bot changed the title Google IoT client's memory limiter conflicts with ESP-IDF heap corruption detector Google IoT client's memory limiter conflicts with ESP-IDF heap corruption detector (CA-102) Nov 18, 2020
@supreetd21
Copy link
Collaborator

Hi @DaStoned,

Thank you for reporting the issue and the detailed analysis.
I tried reproducing your issue by running the smart outlet example with one of the heap poisoning options on IDF v4.1.
I could not get to the point of the above failure.

Can you please provide your sdkconfig if possible?
Does this issue occur in any specific way of terminating the mbedtls session?

@DaStoned
Copy link
Author

DaStoned commented Nov 30, 2020

Hi @supreetd21,

I was able to reproduce the problem in 60-80 percent of cases when doing this:

  • Wait for Wifi to come online, then start a few processes simultaneously in different tasks:
    • Start downloading a file (the firmware update) via HTTPS using esp_https_ota library.
    • Get current time from an NTP server, followed by a connection to Google IoT Core using iot-device-sdk-embedded-c library

I only observed the problem when starting them in parallel. When starting the HTTPS download later (after IoT Core connection was made), there were no problems.

The issue triggered specifically when HTTPS completed. It seems to me that HTTPS download failing (e.g through timeout) or completing was the cause, as this shut down the TLS connection to HTTPS server and produced the assert.

My sdkconfig.txt. I switched to CONFIG_HEAP_POISONING_COMPREHENSIVE when encountering problems.

The HTTPS client and Google IoT Core together manage to consume a lot of heap, but the "high water mark" of heap consumption always shows over 60 KiB remaining when the issue is not triggered (i.e. all tasks complete successfully). Also the "high water mark" of stacks for all involved tasks have at least 1 KiB remaining. So I don't think I ran into an actual memory depletion problem.

I can produce more detailed log output, if needed.

@supreetd21
Copy link
Collaborator

Hi, @DaStoned,
Sorry for the slow turnaround.
Thank you for the analysis and the detailed procedure, the issue could be readily reproduced with your procedure.

I could reproduce the issue even when heap poisoning was disabled. On further inspection, I noticed the root cause, very similar to your hypothesis:

Background:

  • While TLS initialisation in the Mqtt thread, the sdk calls mbedtls_platform_set_calloc_free() in iotc_bsp_tls_mbedtls.c.
    It replaces the TLS memory allocation function (mbedtls_alloc()/mbedtls_free()) implementations with google IoT’s memory allocation functions.

  • With the memory limiter, an intermediate function (iotc_memory_limiter_alloc()) adds some meta-data to the allocated pointer and returns an offseted pointer. The subsequent free function(iotc_memory_limiter_alloc) de-offsets while freeing the memory.

Issue:

  • When another task uses mbedtls_alloc before the Mqtt thread replaces the allocation functions, a vanilla memory allocation happens (no meta-data/offset).
  • Although by the time free is called the TLS alloc function replacement is complete andiotc_memory_limiter_free is called which de-offsets the incoming vanilla pointer and causes an assert.

Without memory limiter, the replaced functions act same as vanilla memory allocations. (port/src/iotc_bsp_mem_posix.c)

  • As suggested, will add a fix to disable the memory limiter by default.

Note:

  • If memory limiter is still to be used, the best practice would be to ensure that mbedtls_platform_set_calloc_free() in iotc_bsp_tls_init() is completed before other tasks start TLS allocations.
  • Also, keep note that as a side effect the TLS allocations in other tasks will be routed through the Google SDK’s intermediate memory functions.

@DaStoned
Copy link
Author

Thank you for the explanation. I'm fortunate to not require the memory limiter functionality (or so I believe), so I can just ignore it.

Otherwise it would be quite tricky to avoid such race conditions reliably in application code. Effectively, TLS operations become a mutually exclusive resource :) I'd have to stop all other TLS tasks while iotc library does connect or re-connect, or something to this tune.

@supreetd21
Copy link
Collaborator

Hi, @DaStoned
Do you want to report this as feedback at google IoT sdk?

@DaStoned
Copy link
Author

I suppose so: GoogleCloudPlatform/iot-device-sdk-embedded-c#115

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants