Exeriments: add missing `#include <iterator>` #1

mtrofin · 2024-03-26T16:14:40Z

consolidated yamls
Fix C++ Formatting with clang-format
Reformat bazel files with buildifier (Reformat bazel files with buildifier google/gematria#21)
Add support for running bazel build in CI (Add support for running bazel build in CI google/gematria#22)
Mark the cache/perf counter experiments as x86-64/Linux only. (Mark the cache/perf counter experiments as x86-64 only. google/gematria#23)
Only run push CI on main branch (Only run push CI on main branch google/gematria#25)
Fix python formatting CI job
Clean up LLVM includes.
Clang-format some files (Clang-format some files google/gematria#29)
Add .swp files to gitignore
Set Bazel version in CI
Add exegesis BB annotator (Add exegesis BB annotator google/gematria#30)
Add script to generate input code snippets for llvm-exegesis (Add script to generate input code snippets for llvm-exegesis google/gematria#28)
Bump Bazelisk version in container to 1.19.0
Bump LLVM version
Fix PR feedback
Add cmake-build to gitignore
Fix llvm-cm after switch to BBRanges (Fix llvm-cm after switch to BBRanges google/gematria#43)
Add test target for llvm-cm (Add test target for llvm-cm google/gematria#42)
Add llvm-cm check target to CI (Add llvm-cm check target to CI google/gematria#44)
Make exegesis annotator work with mappings that aren't page aligned (Make exegesis annotator work with mappings that aren't page aligned google/gematria#33)
Change constants in BHive conversion script (Change constants in BHive conversion script google/gematria#39)
Make exegesis conversion script use appropriate register class (Make exegesis conversion script use appropriate register class google/gematria#37)
Add max_bb_count to exegesis converter (Add max_bb_count to exegesis converter google/gematria#35)
Add JSON output option to BHive converter (Add JSON output option to BHive converter google/gematria#38)
Add support for exegesis annotator in BHive conversion script (Add support for exegesis annotator in BHive conversion script google/gematria#41)
Make --blocks-per-json-file unsigned
Bump LLVM Version (Bump LLVM Version google/gematria#48)
Add progress reporting flag to annotator script
Address reviewer feedback, let user specify block count
Fix off by one error in annotator JSON output (Fix off by one error in annotator JSON output google/gematria#49)
Fix comment spacing in bhive_importer.h
Add function to create proto from disassembled instructions (Add function to create proto from disassembled instructions google/gematria#50)
Enable Bazel tests in CI (Enable Bazel tests in CI google/gematria#54)
Add lit tests for convert_bhive_to_llvm_exegesis_input (Add lit tests for convert_bhive_to_llvm_exegesis_input google/gematria#53)
Add additional lit tests for Exegesis conversion script (Add additional lit tests for Exegesis conversion script google/gematria#62)
Remove outdated comment
Add missing apostrophe in exegesis conversion error log
Add none annotator type (Add none annotator type google/gematria#67)
Improve --report_progress_every flag (Improve --report_progress_every flag google/gematria#64)
Make blocks_per_json_file flag not output extra files (Make blocks_per_json_file flag not output extra files google/gematria#63)
Fix clang formatting issue (Fix clang-formatting issue google/gematria#69)
Error out if we try to annotate a zero address (Error out if we try to annotate a zero address google/gematria#72)
Make exegesis annotator error out when remapping an address (Make exegesis annotator error out when remapping an address google/gematria#73)
Bump LLVM version
Revert "Bump LLVM version"
Bump LLVM version (Bump LLVM version google/gematria#78)
Fix integer overflow errors in the benchmarks (caught by ASAN).
Include the faulting address in the error message when getting a SIGBUS
Fix uninitialized read.
Fixed an ASAN error.
Include the address we failed to map in the error message
Add a register dump when receiving an unexpected signal
Initialise the allocated accessed_addrs blocks
Initialize the entire PipedData object, not just each field.
Change BasicMov test to access 0x10000 instead of 0.
Internal Code Change
Include extra debug data when we didn't read a full PipedObject from the pipe.
Disable find_accessed_addrs_test under sanitizers.
Unmap address 0x800000000 in the child process.
Enable returning errors from the child process, use in a few places
Change initial register value from 0x10000 to 0x15000
Pass initial register values through, and return from FindAccessedAddrs
Cross off some completed TODOs
Try random register values when we fail to run the block.
Add a flag to write failed blocks to a new CSV file
Integrate LLVM at llvm/llvm-project@cca9f9b78fc6
Add --quiet flag to only show the final summary
Internal Change to handle pytype checks in this LSC.
Simplify RandomiseRegs and make trying out different sets of values easier
Randomise registers when getting SIGFPE too
Fix LIT tests that got broken by the internal changes.
NFC: Rename BUILD to BUILD.bazel to make Google-internal/GitHub sync easier.
Exeriments: add missing #include <iterator>

This patch fixes the C++ formatting which was causing the clang-format job to fail. Very minor change, but if we're going to have C++ formatting in CI, we should probably keep it green.

The bazel files were also changed after the recent push and no longer are formatted correctly according to buildifier. This patch does the necessary reformatting.

This patch adds support for running the bazel build in CI to easily catch bazel build regressions. Running the bazel tests will be added in a future patch.

) Mark the cache/perf counter experiments as x86-64/Linux only. * The experimental code depends on x86-64 specific intrinsics, and breaks the build on Apple silicon and other non-x86 platforms. * The perf counter access via the benchmark tool is Linux-specific.

Currently the github actions workflow runs whenever something is pushed to a branch or against any pull request. This means that if a branch is pushed to the main google/gematria repository and a PR is opened against google/gematria:main, there will be duplicate jobs, one for the push event and one for the pull request event. This patch fixes this behavior by restricting the push event to the main branch.

For some reason, using an older version of pyink causes odd failures in the CI complaining about missing functions in a vendored dependency. Upgrading to the most recent version fixes the issue. Maybe should try and make this more hermetic at some point with a lockfile or something, but for now this is a simple fix.

Use the "relative-to-llvm-root" invludes for all LLVM headers.

The previous commit reworked all of the llvm includes, but broke the code formatting job in the meantime. This patch fixes the code formatting to get the CI back to green.

This prevents vim's .swp files from ending up in the repository and commits during development.

Currently the Bazel build is failing due to the CI trying to pull in bazel v7.0.0. This patch pins the bazel version to v6.4.0 which gets the CI green again, allowing further time to work on performing the necessary migration steps to get everything working properly with v7.0.0.

This patch adds in an alternative to the existing find_accessed_addrs infrastructure that uses llvm-exegesis as a backend. We believe this will more closely match the execution environment of the benchmarking environment and should avoid some of the pitfalls of the existing memory annotation infrastructure (at the current expense of speed).

Added a script `dataset/convert_bhive_to_llvm_exegesis_input.cc` that takes a bhive csv file containing a sort of x86 basic blocks in hex format and a `llvm-exegesis` template that defines initial register values (`dataset/llvm-exegesis_wrapper.S` is used by default) as input. It will output code snippets that can be executed by `llvm-exegesis` in target output_dir. Each code snippet(output file) will contain one x86 basic block. ### How to run it? ```bash # build the latest repo bazel build ... # create an output directory mkdir output # run script ./bazel-bin/gematria/datasets/convert_bhive_to_llvm_exegesis_input --bhive_csv=gematria/testing/testdata/test.csv --output_dir=output ```

Nothing currently wrong with the old bazelisk version, but we're two minor versions out of date at this point and there's no real point in not updating.

This patch bumps the LLVM version to a recent ToT commit and patches the Exegesis annotator so that everything compiles and works correctly. Mostly wanting to get this in as I remember debugging weird failures due to things compiling despite a constructor changing within upstream Exegesis.

This is the canonical path to use for the CMake build according to the documentation. Add this to the gitignore so git add --all works with a CMake build setup within the repository.

A couple additional cases were recently added into the BB Address Map section in llvm/llvm-project#74128. This broke llvm-cm as a couple method names changed. This patch provides the quick fix to get everything working. Eventually llvm-cm will need some work to support multiple BBRanges within a single function.

This patch sets up a ninja target for testing llvm-cm, akin to the tool specific test targets in the monorepo. This allows for actually running the tests. Documentation has been added to the README.

Now that the check target is wired up through CMake, we should test this through CI to ensure that there aren't any regressions.

…oogle#33) This patch makes the exegesis annotator work with mappings that are not page aligned. Currently if a segfault occurs at an address that isn't page aligned, Exegesis will try and map an address at that address and then the mmap call will silently fail as there is no error handling wired up for that and it will segfault again, thus creating a loop within the annotator.

This patch adjusts the constants in the BHive conversion script from the default values added in with the script to the values used within the original BHive paper. These constants were experimentally determined within the BHive paper to be high enough to not underflow/access addresses in the first page yet low enough to not access any addresses above the virtual address space ceiling. Anecdotally, these constants have also seemed to work better than the original ones. In addition, this patch prefixes the memory value with zeroes so that llvm-exegesis is able to assume the correct bit width.

…e#37) This patch fixes the register class that the BHive to Exegesis conversion script uses. Currently it is only pulling in registers that don't have a REX encoding, which doesn't even include R9-R15. This patch fixes that behavior by using the more generic LLVM 64-bit GPR Register class, but specifically looking at registers that don't require a REX2 encoding, as there is no hardware in the wild yet that supports APX.

This patch adds a max_bb_count flag to the exegesis converter which makes it easier to generate only a subset of basic blocks from a large CSV for testing purposes. This reduces the need for manually splitting CSVs at the cost of little additional complexity.

This patch adds a JSON output option to the BHive converter. This makes it significantly easier to implement other scripts down the line that ingest this data. This also cuts down the number of inodes that a large data set will use by a significant amount, which can be a problem on some file systems.

) This patch adds support for using the exegesis annotator in the BHive conversion script.

Before this patch, --blocks-per-json-file was setup as unsigned in the flag definitions, but the value was assigned to a normal int later on. This resulted in the check that the value is <= triggering with the default value of the maximum unsigned 32 bit integer. This patch fixes that by assigning the value of the flag to an unsigned value rather than a normal int.

This patch bumps the LLVM version to the latest (as of writing the patch) LLVM commit. This is necessary to pull in new llvm-exegesis functionality like the middle half repetition mode.

This patch adds a progress reporting flag to the annotator script. This can be quite useful when using a slower implementation of FindAccessedAddrs like FindAccessedAddrsExegesis to get a gauge on what sort of progress is being made and to show that the application isn't stuck somewhere when processing a large number of blocks.

PiperOrigin-RevId: 576065680

PiperOrigin-RevId: 580863794

`CreateRandomLinkedList` is not initializing `value` for the last link. Switch to a backwards list creation, which makes the code much simpler. Add test. PiperOrigin-RevId: 582251222

PiperOrigin-RevId: 582887102

PiperOrigin-RevId: 582887618

By default it's all zeroes. So anything read from memory will be zero. This causes issues with pointers to pointers, as we won't be able to map the inner zero address. And also causes floating point exceptions when we divide by the read value. PiperOrigin-RevId: 582989630

This doesn't matter now as there's no padding, but will matter later once we add extra fields. PiperOrigin-RevId: 584862339

0 isn't mappable, so this test only works right now because we don't treat it as a hard error when we fail to map an address accessed in a previous iteration. We're going to change this, so change the test. PiperOrigin-RevId: 586481033

PiperOrigin-RevId: 592152755

…the pipe. PiperOrigin-RevId: 592213983

The tool relies on low-level system libraries and process manipulation to detect the addresses, and it is incompatible with sanitizers that expect a well-behaved C++ program and use a lot of hacks of their own. PiperOrigin-RevId: 592216550

This is the address which will turn up when reading from newly mapped sections, due to how we initialize them. If a block dereferences this pointer, we want it to segfault, so that we can detect the read. On my machine it's unmapped, so all is well. On some machines, it isn't. Really we'd like to unmap everything we can, as blocks can access arbitrary addresses. But this is difficult to do without hitting things which turn out to be necessary and break things in weird ways when unmapped. But we can at least easily fix this one issue with a single targetted munmap. PiperOrigin-RevId: 596433115

Notably we now return an error when the child is unable to map all of the given addresses. With this change we now only succeed on ~85% of blocks from BHive, as before this error wasn't propagated and this case was counted as a success. PiperOrigin-RevId: 596435740

So that accessing memory at a small negative offset from a register will be possible. On the BHive dataset: Before this change 46746 / 330016 blocks failed (~15%) After this change 808 / 330016 blocks fail (~0.25%) PiperOrigin-RevId: 596435824

We still set them all to the same initial register value as before, but this lets us change that in the future, and also lets the caller know what they should be set to without having to depend on our implementation details. PiperOrigin-RevId: 599693469

PiperOrigin-RevId: 599693592

This seems to fix about half of the remaining failures: Before: 329211 / 330016 After: 329605 / 330016 The exact number can differ, as this commit introduces randomness. PiperOrigin-RevId: 602572368

This will make it easier to iterate on things that address failures. PiperOrigin-RevId: 602622698

Updates LLVM usage to match [cca9f9b78fc6](llvm/llvm-project@cca9f9b78fc6) PiperOrigin-RevId: 603664366

PiperOrigin-RevId: 604844084

PiperOrigin-RevId: 606143069

…asier PiperOrigin-RevId: 606784084

SIGFPE can come from (for example) dividing one register by another, and the denominator is zero. So, randomising can potentially help. This only fixes ~20 of the remaining failures, but it's an easy change. PiperOrigin-RevId: 610920806

The changes are due to changes in behavior of FindAccessedAddrs() imported by the recent merge.

…easier.

This probably "just works" because we're using a compiler pre D127675. An up to date compiler would require the missing include.

mtrofin and others added 30 commits October 13, 2023 21:38

consolidated yamls

9c1ca08

Fix C++ Formatting with clang-format

a4379c3

This patch fixes the C++ formatting which was causing the clang-format job to fail. Very minor change, but if we're going to have C++ formatting in CI, we should probably keep it green.

Reformat bazel files with buildifier (google#21)

f9ff064

The bazel files were also changed after the recent push and no longer are formatted correctly according to buildifier. This patch does the necessary reformatting.

Add support for running bazel build in CI (google#22)

4e33169

This patch adds support for running the bazel build in CI to easily catch bazel build regressions. Running the bazel tests will be added in a future patch.

Merge pull request google#26 from boomanaiden154/python-formatting-fix

7bdfd07

Clean up LLVM includes.

a5bac1d

Use the "relative-to-llvm-root" invludes for all LLVM headers.

Clang-format some files (google#29)

93ef931

The previous commit reworked all of the llvm includes, but broke the code formatting job in the meantime. This patch fixes the code formatting to get the CI back to green.

Add .swp files to gitignore

2b8a8f0

This prevents vim's .swp files from ending up in the repository and commits during development.

Bump Bazelisk version in container to 1.19.0

778263f

Nothing currently wrong with the old bazelisk version, but we're two minor versions out of date at this point and there's no real point in not updating.

Fix PR feedback

8d90012

Add cmake-build to gitignore

44cf366

This is the canonical path to use for the CMake build according to the documentation. Add this to the gitignore so git add --all works with a CMake build setup within the repository.

Add test target for llvm-cm (google#42)

94c40d7

This patch sets up a ninja target for testing llvm-cm, akin to the tool specific test targets in the monorepo. This allows for actually running the tests. Documentation has been added to the README.

Add llvm-cm check target to CI (google#44)

671d245

Now that the check target is wired up through CMake, we should test this through CI to ensure that there aren't any regressions.

Add support for exegesis annotator in BHive conversion script (google#41

78ff88e

) This patch adds support for using the exegesis annotator in the BHive conversion script.

Bump LLVM Version (google#48)

53a972c

This patch bumps the LLVM version to the latest (as of writing the patch) LLVM commit. This is necessary to pull in new llvm-exegesis functionality like the middle half repetition mode.

ondrasej and others added 28 commits March 25, 2024 11:04

Fix integer overflow errors in the benchmarks (caught by ASAN).

e6530d0

PiperOrigin-RevId: 576065680

Include the faulting address in the error message when getting a SIGBUS

7df173b

PiperOrigin-RevId: 580863794

Fix uninitialized read.

7b376bb

`CreateRandomLinkedList` is not initializing `value` for the last link. Switch to a backwards list creation, which makes the code much simpler. Add test. PiperOrigin-RevId: 582251222

Fixed an ASAN error.

459dd72

Include the address we failed to map in the error message

43e7554

PiperOrigin-RevId: 582887102

Add a register dump when receiving an unexpected signal

b38f45c

PiperOrigin-RevId: 582887618

Initialize the entire PipedData object, not just each field.

09880f5

This doesn't matter now as there's no padding, but will matter later once we add extra fields. PiperOrigin-RevId: 584862339

Internal Code Change

15f139b

PiperOrigin-RevId: 592152755

Include extra debug data when we didn't read a full PipedObject from …

c873498

…the pipe. PiperOrigin-RevId: 592213983

Cross off some completed TODOs

94763c4

PiperOrigin-RevId: 599693592

Try random register values when we fail to run the block.

25a6803

This seems to fix about half of the remaining failures: Before: 329211 / 330016 After: 329605 / 330016 The exact number can differ, as this commit introduces randomness. PiperOrigin-RevId: 602572368

Add a flag to write failed blocks to a new CSV file

1173126

This will make it easier to iterate on things that address failures. PiperOrigin-RevId: 602622698

Integrate LLVM at llvm/llvm-project@cca9f9b78fc6

7e45792

Updates LLVM usage to match [cca9f9b78fc6](llvm/llvm-project@cca9f9b78fc6) PiperOrigin-RevId: 603664366

Add --quiet flag to only show the final summary

2cf5e3f

PiperOrigin-RevId: 604844084

Internal Change to handle pytype checks in this LSC.

874b93d

PiperOrigin-RevId: 606143069

Simplify RandomiseRegs and make trying out different sets of values e…

0142087

…asier PiperOrigin-RevId: 606784084

Randomise registers when getting SIGFPE too

45cbc76

SIGFPE can come from (for example) dividing one register by another, and the denominator is zero. So, randomising can potentially help. This only fixes ~20 of the remaining failures, but it's an easy change. PiperOrigin-RevId: 610920806

Merge internal changes into 'main'.

8025496

Fix LIT tests that got broken by the internal changes.

1b94370

The changes are due to changes in behavior of FindAccessedAddrs() imported by the recent merge.

NFC: Rename BUILD to BUILD.bazel to make Google-internal/GitHub sync …

86a6201

…easier.

Exeriments: add missing #include <iterator>

0ace52d

This probably "just works" because we're using a compiler pre D127675. An up to date compiler would require the missing include.

mtrofin closed this Mar 26, 2024

mtrofin deleted the include branch March 26, 2024 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exeriments: add missing `#include <iterator>` #1

Exeriments: add missing `#include <iterator>` #1

mtrofin commented Mar 26, 2024

Exeriments: add missing #include <iterator> #1

Exeriments: add missing #include <iterator> #1

Conversation

mtrofin commented Mar 26, 2024

Exeriments: add missing `#include <iterator>` #1

Exeriments: add missing `#include <iterator>` #1