Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote the speed of Gimli-based addr2line #308

Closed
marxin opened this issue Jun 29, 2024 · 9 comments
Closed

Promote the speed of Gimli-based addr2line #308

marxin opened this issue Jun 29, 2024 · 9 comments

Comments

@marxin
Copy link
Contributor

marxin commented Jun 29, 2024

First of all, thank you for the tool!

Let me explain how I discovered your tool. I was watching one of @jonhoo's videos about Inferno and I was curious how it works. I had a fuzzy memory (from the time I used FlameGraph) that the Perf scripts are slow so I tried to use Inferno for 2 projects: mold and MozillaFirefox. I collected a small profile for both of them and quickly realized the binutils' addr2line is really slow. My experiment just runs Firefox, opens https://html5test.com/ and finishes.

If I run perf report or (perf script), the loading in perf (mostly dominated with addr2line calls) takes >5 minutes. If I replace the binutils' addr2line with your implementation, I get down to 10 seconds which is an incredible improvement.
Based on strace profile, it seems addr2line is called about 20K x for the biggest Firefox' library libxul.so. If I take first 1000 address queries and run them among the various addr2line implementations, I get the following numbers:

❯ hyperfine "cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null"
Benchmark 1: cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      66.2 ms ±   2.6 ms    [User: 45.9 ms, System: 21.1 ms]
  Range (min … max):    61.5 ms …  72.5 ms    42 runs
 
Benchmark 2: cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):     218.2 ms ±   4.3 ms    [User: 192.2 ms, System: 26.8 ms]
  Range (min … max):   211.4 ms … 224.1 ms    13 runs
 
Benchmark 3: cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      2.181 s ±  0.071 s    [User: 1.947 s, System: 0.235 s]
  Range (min … max):    2.111 s …  2.351 s    10 runs
 
Benchmark 4: cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      5.194 s ±  0.344 s    [User: 5.175 s, System: 0.017 s]
  Range (min … max):    4.776 s …  5.744 s    10 runs
 
Summary
  cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null ran
    3.29 ± 0.14 times faster than cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
   32.94 ± 1.67 times faster than cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
   78.43 ± 6.03 times faster than cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null

From the visual inspection of the output (I've just compared Gimpli and Binutils), the results are pretty much the same! That being said, I think you should bravely promote your tool in the README.md file. Good job!

Note the slowness of Binutils' perf has new implications related to newer perf releases as the tool newly applies a timeout which can make the situation even worse: https://bugzilla.kernel.org/show_bug.cgi?id=218996.

@philipc
Copy link
Contributor

philipc commented Jul 1, 2024

Thanks for the feedback.

We used to do a benchmark comparison (there's still a benchmark script in the repository). However, keeping that up to date is additional maintenance effort that I didn't want to spend, and promoting old information feels wrong.

Another reason was that the bin application isn't as polished as it could be (although I have done some work on that recently).

@marxin
Copy link
Contributor Author

marxin commented Jul 1, 2024

Regarding the benchmarking effort: Is there anything I can help you with? It seems to me the existing Rust bench infrastructure already contains a way to get a random selection of addresses:

fn get_test_addresses(target: &object::File<'_>) -> Vec<u64> {
let addresses: Vec<_> = target
.symbols()
.filter(|s| s.kind() == object::SymbolKind::Text && s.address() != 0 && s.size() != 0)
.take(200)
.flat_map(|s| {
let addr = s.address();
let size = s.size();
let end = addr + size;
(addr..end).step_by(5)
})
.collect();
assert!(!addresses.is_empty());
addresses
}

which can be easily transformed into something that would run various binaries for a selection of non-trivial ELF binaries with debug info. If you're willing to guide me, I can help you.

What kind of polishing do you mean?

Generally speaking, for a non-trivial binaries (medium to big applications), the performance is so outstanding to Binutils that I would not hesitate to mention it 🎉

@mstange
Copy link
Contributor

mstange commented Jul 2, 2024

@marxin You can also use samply import perf.data to look at the profile, and it might be even faster because it doesn't use a big text file as an intermediate. It uses this crate internally. https://github.com/mstange/samply

@philipc
Copy link
Contributor

philipc commented Jul 3, 2024

The main problem with running the benchmarks is having a reproducible environment containing recent versions of the tools. I don't think this is a hard problem, it just hasn't been a priority.

For polishing, the main thing lacking is in locating the various places where the debug info needs to be loaded from. #296 and #298 contained some improvements for this, but there's still more to be done. There's maybe other minor things, e.g. some unwraps in the cli code.

There's also wholesym-addr2line (part of samply) which probably does all the debug info locating correctly, and handles pdb as well.

@marxin
Copy link
Contributor Author

marxin commented Jul 11, 2024

The main problem with running the benchmarks is having a reproducible environment containing recent versions of the tools. I don't think this is a hard problem, it just hasn't been a priority.

Yep. I might offer a rolling Linux distribution like openSUSE Tumbleweed, which can be run in a container with something like:

FROM opensuse/tumbleweed
RUN zypper addrepo http://download.opensuse.org/debug/tumbleweed/repo/oss/ debug
RUN zypper in -y MozillaFirefox-debuginfo binutils time llvm

The simplest testing approach can be something like:

llvm-nm /usr/lib/debug/usr/lib64/firefox/libxul.so.debug | grep '^00' | cut -f1 -d ' ' | shuf | head -n1000 | time addr2line --exe /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif

Is it feasible for the benchmarking?

There's also wholesym-addr2line (part of samply) which probably does all the debug info locating correctly, and handles pdb as well.

@mstange Is the ambition to have wholesym-addr2line command line tool available on crates.io as a drop-in replacement for llvm-addr2line or Binutils' addr2line?

@philipc
Copy link
Contributor

philipc commented Jul 25, 2024

I'm not familiar with openSUSE. If you think it's a good choice then I'll accept it. I'll accept a PR for something that will have low ongoing maintenance, but I don't want to spend much time thinking about it myself.

llvm-nm /usr/lib/debug/usr/lib64/firefox/libxul.so.debug | grep '^00' | cut -f1 -d ' ' | shuf | head -n1000 | time addr2line --exe /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif

I don't think llvm-nm is a good source of addresses, since it will only give the start address of each function, but we want addresses in the middle too where inlining is more likely. You could use the --all option from the addr2line in this crate: target/release/addr2line -ap --all -e ....

You'll need to specify a random source for shuf that is deterministic.

@marxin
Copy link
Contributor Author

marxin commented Jul 25, 2024

I've just made a #315 where I added a benchmarking script, please take a look.

@philipc
Copy link
Contributor

philipc commented Aug 2, 2024

I've added a plot for those benchmark results to the readme in #322, so I'm going to close this as done. Thanks for your work on this, and I'm happy to accept further PRs to expand on this if you see the need.

@philipc philipc closed this as completed Aug 2, 2024
@marxin
Copy link
Contributor Author

marxin commented Aug 5, 2024

The plotting output is a great idea and I've just improved it a bit in #325.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants