luzer: reserve and handoff ctrs to lf #16

azanegin · 2023-12-29T19:40:10Z

Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Also, a minor fix to signal handling.

Fixes #12

Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Changes to signatures of counter functions help fix bugs with sign arithmetic. Also, a minor fix to signal handling and parameter name changes to evade name shadowing of global variables. Fixes ligurio#12

ligurio

Alex, thanks for your patch!
I did an initial review, please take a look n my comments. In general, I like an idea, but we need to polish an implementation.

ligurio · 2024-02-09T12:27:08Z

luzer/io.cc

+ * What's worse than using non-public-API is using C++. But this project already
+ * uses clang++ with 'fuzzed_data_provider.cc'. Hey, libfuzzer IS written in C++.


I don't like the approach. Please rewrite to C.

Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()? I could do the former easily, but the latter would require significant work to be cross-platform.

Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()?

I would prefer a plain C variant, but C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer would be okay too.

ligurio · 2024-02-09T12:27:39Z

luzer/counters.c

-int counter_index = 0;
+size_t counter_index = 0;
 // Number of counters given to Libfuzzer.


I would split this commit to a number of commits:

change datatype int -> size_t

fix __sanitizer_cov_8bit_counters_init never invoked for interpreter #12

patch that adds NO_SANITIZE

...

ligurio · 2024-02-09T12:38:29Z

luzer/io.cc

+		/*Epoch = */nullptr,
+		/*MaxSize = */SIZE_MAX,
+		/*ExitOnError = */false,
+		/*VPaths = */nullptr


How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability. Should I remove them?

How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability.

Sorry, overlooked a real code due to non-usual comment style.

Should I remove them?

I would rewrite it to:

fuzzer::ReadDirToVectorOfUnits( dirpath, &seed_corpus, /*Epoch */ nullptr, /*MaxSize */ SIZE_MAX, /*ExitOnError */ false, /*VPaths */ nullptr };

to avoid confusion.
Or even add a prototype with self-explained names of arguments to a comment:

/* * void ReadDirToVectorOfUnits(const char *Path, std::vector<Unit> *V, * long *Epoch, size_t MaxSize, bool ExitOnError); */

ligurio · 2024-02-09T12:38:59Z

luzer/luzer.c

@@ -26,6 +26,8 @@
 #include "version.h"
 #include "luzer.h"

+#define GLOBAL_BYTECODE_TO_COUNTERS_SCALE 4


This requires a comment

ligurio · 2024-02-09T12:43:32Z

luzer/luzer.c

+			"for k, v in pairs(table_to_count) do\n"
+				"if type(v) == 'function' and what(v) == 'Lua' then\n"
+					"-- we dont care for already-seen funcs\n"
+					"bytecode_size = bytecode_size + string.len(string.dump(v))\n"


I believe debug information should be stripped (string.dump(v, 1)).

ligurio · 2024-02-09T12:47:31Z

luzer/luzer.c

+ * Basically, this is stupid and straigtforward - table tree walk from '_G'.
+ * '_G' is Lua's special table for global stuff.
+ * 'string.dump' works even in latest LuaJIT. Bytecode is not crossplatform but we don't
+ * need that.


I see a limitation of this approach: all Lua modules must be loaded before running the fuzzing process. Right?

Right. In theory, we could update counters at runtime, but I would need to test if LF is okay with that. Tbh I guess second estimation strategy is better for the case when a lot of code is loaded dynamically.

The much bigger limitation r/n as I see it is local.

ligurio · 2024-02-09T12:49:21Z

luzer/luzer.c

+ * And C implementation would require much more time.
+ */
+NO_SANITIZE static inline __attribute__((unused)) int
+lua_approx_global_bytecode_size(lua_State *L)


Rename to something like lua_estimate_global_functions_bc_size.

ligurio · 2024-02-09T12:54:01Z

luzer/luzer.c

+ * This also can be written in C, but I see no reason for it. It should run only once.
+ * And C implementation would require much more time.


To be honestly, I don't like a current implementation. However, I don't know what would be better. Agree, that rewriting to Lua C will probably a waste of time and probably be less maintainable. Probably, we should put a Lua code to a separate file and embed it on build stage, see [^1] and [^2].

This Lua function could be loaded on initial stage like luaL_set_custom_mutator and exported in luzer module, see [^3].

ligurio/lua-c-api-tests@719cf90

ligurio/lua-c-api-tests@ba02938#diff-7fc5f49402ccaaa71949368218a21f5ee991358185ac6b01531b662259f9d585

https://github.com/ligurio/luzer/blob/f37950549bca59330117cbb44943b1984ab98b2c/luzer/luzer.c#L429C27-L429C50

ligurio · 2024-02-09T12:57:51Z

CHANGELOG.md

+- Two ways to approximate amount of counters for interpreted code.
+
+### Fixed
+- Interpreted code counter never handed to libfuzzer. (#12)
+- Bad lifetime and initization of struct sigaction.


One commit - one changelog entry, please

azanegin force-pushed the fix-counters branch from e58c5a1 to 8c83666 Compare December 30, 2023 17:19

azanegin force-pushed the fix-counters branch from 8c83666 to d0ba90a Compare January 24, 2024 16:47

azanegin force-pushed the fix-counters branch from d0ba90a to 9934011 Compare January 28, 2024 20:59

ligurio reviewed Feb 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

luzer: reserve and handoff ctrs to lf #16

luzer: reserve and handoff ctrs to lf #16

azanegin commented Dec 29, 2023

ligurio left a comment

ligurio Feb 9, 2024

azanegin Feb 9, 2024

ligurio Feb 18, 2024

ligurio Feb 9, 2024

ligurio Feb 9, 2024

azanegin Feb 9, 2024

ligurio Feb 18, 2024

ligurio Feb 9, 2024

ligurio Feb 9, 2024

ligurio Feb 9, 2024

azanegin Feb 9, 2024

ligurio Feb 9, 2024

ligurio Feb 9, 2024

ligurio Feb 9, 2024

		* What's worse than using non-public-API is using C++. But this project already
		* uses clang++ with 'fuzzed_data_provider.cc'. Hey, libfuzzer IS written in C++.

		* This also can be written in C, but I see no reason for it. It should run only once.
		* And C implementation would require much more time.

luzer: reserve and handoff ctrs to lf #16

Are you sure you want to change the base?

luzer: reserve and handoff ctrs to lf #16

Conversation

azanegin commented Dec 29, 2023

ligurio left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment