Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AP_Filesystem: ROMFS: fix open race conditions #29201

Merged
merged 1 commit into from
Feb 1, 2025

Conversation

tpwrules
Copy link
Contributor

@tpwrules tpwrules commented Jan 31, 2025

Lua opens scripts to load them into memory, then the logger opens them after to stream them into the dataflash log. When loading multiple large Lua scripts from ROMFS, decompression takes a significant amount of time. This creates the opportunity for the Lua interpreter and logging threads to both be inside AP_Filesystem_ROMFS::open() decompressing a file.

If this happens, the function can return the same fd for two different calls as the fd is chosen before decompression starts, but only marked as being used after that finishes. The read pointers then stomp on each other, so Lua loads garbled scripts (usually resulting in a syntax error) and the logger dumps garbled data.

Fix the issue by locking before searching for a free record (or marking a record as free). Apply the same fix to directories as well. This doesn't protect against using the same fd/dirp from multiple threads, but that behavior is to be discouraged anyway and is not the root cause here.

Patch to view the evidence:

diff --git a/libraries/AP_Filesystem/AP_Filesystem_ROMFS.cpp b/libraries/AP_Filesystem/AP_Filesystem_ROMFS.cpp
index f481d87255..2fbde94d93 100644
--- a/libraries/AP_Filesystem/AP_Filesystem_ROMFS.cpp
+++ b/libraries/AP_Filesystem/AP_Filesystem_ROMFS.cpp
@@ -26,6 +26,8 @@
 #include <AP_Math/AP_Math.h>
 #include <AP_ROMFS/AP_ROMFS.h>
 
+#include <stdio.h>
+
 int AP_Filesystem_ROMFS::open(const char *fname, int flags, bool allow_absolute_paths)
 {
     if ((flags & O_ACCMODE) != O_RDONLY) {
@@ -52,6 +54,7 @@ int AP_Filesystem_ROMFS::open(const char *fname, int flags, bool allow_absolute_
         return -1;
     }
     file[idx].ofs = 0;
+    printf("open %d\n", idx);
     return idx;
 }
 
@@ -63,6 +66,7 @@ int AP_Filesystem_ROMFS::close(int fd)
     }
     AP_ROMFS::free(file[fd].data);
     file[fd].data = nullptr;
+    printf("close %d\n", fd);
     return 0;
 }

The evidence (fd 0 is opened twice):

AP: Scripting: stopped
AP: Scripting: restarted
Lua: State memory usage: 2796 + 5897
open 0
close 0
open 0
close 0
open 0
open 0
close 0
Lua: Error: @ROMFS/scripts/script2.lua:482: 'end' expected (to close 'function' at line 467) near <eof>
AP: Lua: Error: @ROMFS/scripts/script2.lua:482: 'end' expected (to close 'function' at line 467) near <eof>

Copy link
Contributor

@tridge tridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very good find!!
I'd prefer just to put the mutex at the start of open(). We only call open in places where timing is not critical, so I think simplicity is more important than efficiency, and it would be a more obvious patch for 4.6.x

Copy link
Collaborator

@robertlong13 robertlong13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested on CubeOrange and it is working. 80 reloads with no issue. Rolling back I can reproduce every 5 attempts or so.

Lua opens scripts to load them into memory, then the logger opens them
after to stream them into the dataflash log. When loading multiple large
Lua scripts from ROMFS, decompression takes a significant amount of
time. This creates the opportunity for the Lua interpreter and logging
threads to both be inside `AP_Filesystem_ROMFS::open()` decompressing a
file.

If this happens, the function can return the same `fd` for two different
calls as the `fd` is chosen before decompression starts, but only marked
as being used after that finishes. The read pointers then stomp on each
other, so Lua loads garbled scripts (usually resulting in a syntax
error) and the logger dumps garbled data.

Fix the issue by locking before searching for a free record (or marking
a record as free). Apply the same fix to directories as well. This
doesn't protect against using the same `fd`/`dirp` from multiple
threads, but that behavior is to be discouraged anyway and is not the
root cause here.
@peterbarker peterbarker merged commit 63afcae into ArduPilot:master Feb 1, 2025
100 checks passed
@tpwrules tpwrules deleted the pr/romfs-race branch February 1, 2025 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Pending
Development

Successfully merging this pull request may close these issues.

4 participants