Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

luzer: reserve and handoff ctrs to lf #16

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

azanegin
Copy link
Contributor

Until now, luzer had not used at all coverage information for interpreted code. Hook-based instrumentation collected data, but it were never passed to libfuzzer to drew features from. Memory always were allocated in a fixed default kMax... size. This commit includes a fix to properly pass counters to libfuzzer, two systems to approximate optimal amount of 8-bit counters: one based on testing, pre-run phase, and one based on active bytecode size. Also, a minor fix to signal handling.

Fixes #12

Until now, luzer had not used at all coverage information for
interpreted code. Hook-based instrumentation collected data, but
it were never passed to libfuzzer to drew features from. Memory
always were allocated in a fixed default kMax... size. This
commit includes a fix to properly pass counters to libfuzzer,
two systems to approximate optimal amount of 8-bit counters:
one based on testing, pre-run phase, and one based on active bytecode
size.
Changes to signatures of counter functions help fix bugs with
sign arithmetic.
Also, a minor fix to signal handling and parameter name changes
to evade name shadowing of global variables.

Fixes ligurio#12
Copy link
Owner

@ligurio ligurio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alex, thanks for your patch!
I did an initial review, please take a look n my comments. In general, I like an idea, but we need to polish an implementation.

Comment on lines +15 to +16
* What's worse than using non-public-API is using C++. But this project already
* uses clang++ with 'fuzzed_data_provider.cc'. Hey, libfuzzer IS written in C++.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the approach. Please rewrite to C.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()? I could do the former easily, but the latter would require significant work to be cross-platform.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be okay with C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer or do you mean "write new, non-libfuzzer IO code in C without using fuzzer::ReadDirToVectorOfUnits()?

I would prefer a plain C variant, but C-binding-to-a-mangled-Cpp-symbol-of-libfuzzer would be okay too.

Comment on lines -28 to 29
int counter_index = 0;
size_t counter_index = 0;
// Number of counters given to Libfuzzer.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would split this commit to a number of commits:

Comment on lines +81 to +84
/*Epoch = */nullptr,
/*MaxSize = */SIZE_MAX,
/*ExitOnError = */false,
/*VPaths = */nullptr
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dead code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability. Should I remove them?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? They are not optional arguments, and this function call is the most important thing in this file. Do you mean inline-commented argument names? This is for readability.

Sorry, overlooked a real code due to non-usual comment style.

Should I remove them?

I would rewrite it to:

	fuzzer::ReadDirToVectorOfUnits(
		dirpath,
		&seed_corpus,
		/*Epoch */
                nullptr,
		/*MaxSize */
                SIZE_MAX,
		/*ExitOnError */
                false,
		/*VPaths */
                nullptr
     };

to avoid confusion.
Or even add a prototype with self-explained names of arguments to a comment:

/*
  * void ReadDirToVectorOfUnits(const char *Path, std::vector<Unit> *V,
  *                             long *Epoch, size_t MaxSize, bool ExitOnError);
  */

@@ -26,6 +26,8 @@
#include "version.h"
#include "luzer.h"

#define GLOBAL_BYTECODE_TO_COUNTERS_SCALE 4
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a comment

"for k, v in pairs(table_to_count) do\n"
"if type(v) == 'function' and what(v) == 'Lua' then\n"
"-- we dont care for already-seen funcs\n"
"bytecode_size = bytecode_size + string.len(string.dump(v))\n"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe debug information should be stripped (string.dump(v, 1)).

Comment on lines +332 to +335
* Basically, this is stupid and straigtforward - table tree walk from '_G'.
* '_G' is Lua's special table for global stuff.
* 'string.dump' works even in latest LuaJIT. Bytecode is not crossplatform but we don't
* need that.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a limitation of this approach: all Lua modules must be loaded before running the fuzzing process. Right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. In theory, we could update counters at runtime, but I would need to test if LF is okay with that. Tbh I guess second estimation strategy is better for the case when a lot of code is loaded dynamically.

The much bigger limitation r/n as I see it is local.

* And C implementation would require much more time.
*/
NO_SANITIZE static inline __attribute__((unused)) int
lua_approx_global_bytecode_size(lua_State *L)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to something like lua_estimate_global_functions_bc_size.

Comment on lines +344 to +345
* This also can be written in C, but I see no reason for it. It should run only once.
* And C implementation would require much more time.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honestly, I don't like a current implementation. However, I don't know what would be better. Agree, that rewriting to Lua C will probably a waste of time and probably be less maintainable. Probably, we should put a Lua code to a separate file and embed it on build stage, see [^1] and [^2].

This Lua function could be loaded on initial stage like luaL_set_custom_mutator and exported in luzer module, see [^3].

  1. ligurio/lua-c-api-tests@719cf90
  2. ligurio/lua-c-api-tests@ba02938#diff-7fc5f49402ccaaa71949368218a21f5ee991358185ac6b01531b662259f9d585
  3. https://github.com/ligurio/luzer/blob/f37950549bca59330117cbb44943b1984ab98b2c/luzer/luzer.c#L429C27-L429C50

Comment on lines +17 to +21
- Two ways to approximate amount of counters for interpreted code.

### Fixed
- Interpreted code counter never handed to libfuzzer. (#12)
- Bad lifetime and initization of struct sigaction.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One commit - one changelog entry, please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

__sanitizer_cov_8bit_counters_init never invoked for interpreter
2 participants