Skip to content

Commit

Permalink
data_prep: only include c/cpp files as potential harnesses (#317)
Browse files Browse the repository at this point in the history
A more precise approach is needed when searching for potential harnesses
due to performance issues.

For example, when analyzing `croaring` the following files all match the
existing set up, but none of them have the potential of being the actual
source code of a harness.

```sh
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/cmake_install.cmake
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/cmake.check_cache
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/CMakeDirectoryInformation.cmake
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/cmTC_bc5f1.dir/DependInfo.cmake
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/cmTC_bc5f1.dir/CMakeCXXCompilerABI.cpp.o.d
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/cmTC_bc5f1.dir/cmake_clean.cmake
/tmp/tmpjg7oqx4m/out/src/croaring/build-dir/CMakeFiles/CMakeScratch/TryCompile-9ReWwb/CMakeFiles/cmTC_bc5f1.dir/CMakeCXXCompilerABI.cpp.o
```

For `croaring`, a total of 1200+ potential harnesses are identified with
the current approach, and it takes a fairly long time (tens of minutes)
to run through scanning for potential harnesess as a fair amount of the
wrong files causes the 100sec timeout in `clang-format`.

This PR makes the logic more precise to only include relevant source
code files. For `croaring` it reduces to 65 potential harnesses
(including the relevant harnesses) and it takes a couple of seconds to
go through scanning for potential harnesses.

---------

Signed-off-by: David Korczynski <[email protected]>
  • Loading branch information
DavidKorczynski authored Jun 12, 2024
1 parent d220e84 commit 0bed2b5
Showing 1 changed file with 2 additions and 3 deletions.
5 changes: 2 additions & 3 deletions data_prep/project_src.py
Original file line number Diff line number Diff line change
Expand Up @@ -312,9 +312,8 @@ def _identify_fuzz_targets(out: str, interesting_filenames: list[str],
# TODO(dongge): Figure out why the path does not match Bazel projects.
if os.path.basename(short_path) in interesting_filenames:
interesting_filepaths.append(path)
# This should also include .cpp and .cc but exclude headers which
# usually don't contain fuzzer definitions.
if '.c' in path:

if any(path.endswith(suffix) for suffix in SEARCH_EXTS):
potential_harnesses.append(path)

return potential_harnesses, interesting_filepaths
Expand Down

0 comments on commit 0bed2b5

Please sign in to comment.