-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
luzer: reserve and handoff ctrs to lf #16
base: master
Are you sure you want to change the base?
Conversation
e58c5a1
to
8c83666
Compare
8c83666
to
d0ba90a
Compare
Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Changes to signatures of counter functions help fix bugs with sign arithmetic. Also, a minor fix to signal handling and parameter name changes to evade name shadowing of global variables. Fixes ligurio#12
d0ba90a
to
9934011
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alex, thanks for your patch!
I did an initial review, please take a look n my comments. In general, I like an idea, but we need to polish an implementation.
* What's worse than using non-public-API is using C++. But this project already | ||
* uses clang++ with 'fuzzed_data_provider.cc'. Hey, libfuzzer IS written in C++. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the approach. Please rewrite to C.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()
? I could do the former easily, but the latter would require significant work to be cross-platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()?
I would prefer a plain C variant, but C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer would be okay too.
int counter_index = 0; | ||
size_t counter_index = 0; | ||
// Number of counters given to Libfuzzer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would split this commit to a number of commits:
- change datatype
int
->size_t
- fix __sanitizer_cov_8bit_counters_init never invoked for interpreter #12
- patch that adds
NO_SANITIZE
- ...
/*Epoch = */nullptr, | ||
/*MaxSize = */SIZE_MAX, | ||
/*ExitOnError = */false, | ||
/*VPaths = */nullptr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability. Should I remove them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability.
Sorry, overlooked a real code due to non-usual comment style.
Should I remove them?
I would rewrite it to:
fuzzer::ReadDirToVectorOfUnits(
dirpath,
&seed_corpus,
/*Epoch */
nullptr,
/*MaxSize */
SIZE_MAX,
/*ExitOnError */
false,
/*VPaths */
nullptr
};
to avoid confusion.
Or even add a prototype with self-explained names of arguments to a comment:
/*
* void ReadDirToVectorOfUnits(const char *Path, std::vector<Unit> *V,
* long *Epoch, size_t MaxSize, bool ExitOnError);
*/
@@ -26,6 +26,8 @@ | |||
#include "version.h" | |||
#include "luzer.h" | |||
|
|||
#define GLOBAL_BYTECODE_TO_COUNTERS_SCALE 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This requires a comment
"for k, v in pairs(table_to_count) do\n" | ||
"if type(v) == 'function' and what(v) == 'Lua' then\n" | ||
"-- we dont care for already-seen funcs\n" | ||
"bytecode_size = bytecode_size + string.len(string.dump(v))\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe debug information should be stripped (string.dump(v, 1)
).
* Basically, this is stupid and straigtforward - table tree walk from '_G'. | ||
* '_G' is Lua's special table for global stuff. | ||
* 'string.dump' works even in latest LuaJIT. Bytecode is not crossplatform but we don't | ||
* need that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a limitation of this approach: all Lua modules must be loaded before running the fuzzing process. Right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. In theory, we could update counters at runtime, but I would need to test if LF is okay with that. Tbh I guess second estimation strategy is better for the case when a lot of code is loaded dynamically.
The much bigger limitation r/n as I see it is local
.
* And C implementation would require much more time. | ||
*/ | ||
NO_SANITIZE static inline __attribute__((unused)) int | ||
lua_approx_global_bytecode_size(lua_State *L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to something like lua_estimate_global_functions_bc_size
.
* This also can be written in C, but I see no reason for it. It should run only once. | ||
* And C implementation would require much more time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honestly, I don't like a current implementation. However, I don't know what would be better. Agree, that rewriting to Lua C will probably a waste of time and probably be less maintainable. Probably, we should put a Lua code to a separate file and embed it on build stage, see [^1] and [^2].
This Lua function could be loaded on initial stage like luaL_set_custom_mutator
and exported in luzer
module, see [^3].
- Two ways to approximate amount of counters for interpreted code. | ||
|
||
### Fixed | ||
- Interpreted code counter never handed to libfuzzer. (#12) | ||
- Bad lifetime and initization of struct sigaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One commit - one changelog entry, please
Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Also, a minor fix to signal handling.
Fixes #12