Promote the speed of Gimli-based addr2line #308

marxin · 2024-06-29T08:30:22Z

First of all, thank you for the tool!

Let me explain how I discovered your tool. I was watching one of @jonhoo's videos about Inferno and I was curious how it works. I had a fuzzy memory (from the time I used FlameGraph) that the Perf scripts are slow so I tried to use Inferno for 2 projects: mold and MozillaFirefox. I collected a small profile for both of them and quickly realized the binutils' addr2line is really slow. My experiment just runs Firefox, opens https://html5test.com/ and finishes.

If I run perf report or (perf script), the loading in perf (mostly dominated with addr2line calls) takes >5 minutes. If I replace the binutils' addr2line with your implementation, I get down to 10 seconds which is an incredible improvement.
Based on strace profile, it seems addr2line is called about 20K x for the biggest Firefox' library libxul.so. If I take first 1000 address queries and run them among the various addr2line implementations, I get the following numbers:

❯ hyperfine "cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null" "cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null"
Benchmark 1: cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      66.2 ms ±   2.6 ms    [User: 45.9 ms, System: 21.1 ms]
  Range (min … max):    61.5 ms …  72.5 ms    42 runs
 
Benchmark 2: cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):     218.2 ms ±   4.3 ms    [User: 192.2 ms, System: 26.8 ms]
  Range (min … max):   211.4 ms … 224.1 ms    13 runs
 
Benchmark 3: cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      2.181 s ±  0.071 s    [User: 1.947 s, System: 0.235 s]
  Range (min … max):    2.111 s …  2.351 s    10 runs
 
Benchmark 4: cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
  Time (mean ± σ):      5.194 s ±  0.344 s    [User: 5.175 s, System: 0.017 s]
  Range (min … max):    4.776 s …  5.744 s    10 runs
 
Summary
  cat /tmp/1000.txt | time /home/marxin/Programming/addr2line/target/release/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null ran
    3.29 ± 0.14 times faster than cat /tmp/1000.txt | time llvm-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
   32.94 ± 1.67 times faster than cat /tmp/1000.txt | time /usr/bin/addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null
   78.43 ± 6.03 times faster than cat /tmp/1000.txt | eu-addr2line -e /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif >/dev/null

From the visual inspection of the output (I've just compared Gimpli and Binutils), the results are pretty much the same! That being said, I think you should bravely promote your tool in the README.md file. Good job!

Note the slowness of Binutils' perf has new implications related to newer perf releases as the tool newly applies a timeout which can make the situation even worse: https://bugzilla.kernel.org/show_bug.cgi?id=218996.

The text was updated successfully, but these errors were encountered:

philipc · 2024-07-01T02:35:01Z

Thanks for the feedback.

We used to do a benchmark comparison (there's still a benchmark script in the repository). However, keeping that up to date is additional maintenance effort that I didn't want to spend, and promoting old information feels wrong.

Another reason was that the bin application isn't as polished as it could be (although I have done some work on that recently).

marxin · 2024-07-01T11:14:51Z

Regarding the benchmarking effort: Is there anything I can help you with? It seems to me the existing Rust bench infrastructure already contains a way to get a random selection of addresses:

addr2line/benches/bench.rs

Lines 50 to 66 in db88c5a

    
           fn get_test_addresses(target: &object::File<'_>) -> Vec<u64> { 
        
               let addresses: Vec<_> = target 
        
                   .symbols() 
        
                   .filter(|s| s.kind() == object::SymbolKind::Text && s.address() != 0 && s.size() != 0) 
        
                   .take(200) 
        
                   .flat_map(|s| { 
        
                       let addr = s.address(); 
        
                       let size = s.size(); 
        
                       let end = addr + size; 
        
                       (addr..end).step_by(5) 
        
                   }) 
        
                   .collect(); 
        
               assert!(!addresses.is_empty()); 
        
               addresses 
        
           }

which can be easily transformed into something that would run various binaries for a selection of non-trivial ELF binaries with debug info. If you're willing to guide me, I can help you.

What kind of polishing do you mean?

Generally speaking, for a non-trivial binaries (medium to big applications), the performance is so outstanding to Binutils that I would not hesitate to mention it 🎉

mstange · 2024-07-02T20:47:29Z

@marxin You can also use samply import perf.data to look at the profile, and it might be even faster because it doesn't use a big text file as an intermediate. It uses this crate internally. https://github.com/mstange/samply

philipc · 2024-07-03T03:53:53Z

The main problem with running the benchmarks is having a reproducible environment containing recent versions of the tools. I don't think this is a hard problem, it just hasn't been a priority.

For polishing, the main thing lacking is in locating the various places where the debug info needs to be loaded from. #296 and #298 contained some improvements for this, but there's still more to be done. There's maybe other minor things, e.g. some unwraps in the cli code.

There's also wholesym-addr2line (part of samply) which probably does all the debug info locating correctly, and handles pdb as well.

marxin · 2024-07-11T12:36:37Z

The main problem with running the benchmarks is having a reproducible environment containing recent versions of the tools. I don't think this is a hard problem, it just hasn't been a priority.

Yep. I might offer a rolling Linux distribution like openSUSE Tumbleweed, which can be run in a container with something like:

FROM opensuse/tumbleweed
RUN zypper addrepo http://download.opensuse.org/debug/tumbleweed/repo/oss/ debug
RUN zypper in -y MozillaFirefox-debuginfo binutils time llvm

The simplest testing approach can be something like:

llvm-nm /usr/lib/debug/usr/lib64/firefox/libxul.so.debug | grep '^00' | cut -f1 -d ' ' | shuf | head -n1000 | time addr2line --exe /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif

Is it feasible for the benchmarking?

There's also wholesym-addr2line (part of samply) which probably does all the debug info locating correctly, and handles pdb as well.

@mstange Is the ambition to have wholesym-addr2line command line tool available on crates.io as a drop-in replacement for llvm-addr2line or Binutils' addr2line?

philipc · 2024-07-25T08:33:41Z

I'm not familiar with openSUSE. If you think it's a good choice then I'll accept it. I'll accept a PR for something that will have low ongoing maintenance, but I don't want to spend much time thinking about it myself.

llvm-nm /usr/lib/debug/usr/lib64/firefox/libxul.so.debug | grep '^00' | cut -f1 -d ' ' | shuf | head -n1000 | time addr2line --exe /usr/lib/debug/usr/lib64/firefox/libxul.so.debug -aif

I don't think llvm-nm is a good source of addresses, since it will only give the start address of each function, but we want addresses in the middle too where inlining is more likely. You could use the --all option from the addr2line in this crate: target/release/addr2line -ap --all -e ....

You'll need to specify a random source for shuf that is deterministic.

marxin · 2024-07-25T13:08:42Z

I've just made a #315 where I added a benchmarking script, please take a look.

philipc · 2024-08-02T08:44:43Z

I've added a plot for those benchmark results to the readme in #322, so I'm going to close this as done. Thanks for your work on this, and I'm happy to accept further PRs to expand on this if you see the need.

marxin · 2024-08-05T05:24:35Z

The plotting output is a great idea and I've just improved it a bit in #325.

philipc closed this as completed Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Promote the speed of Gimli-based addr2line #308

Promote the speed of Gimli-based addr2line #308

marxin commented Jun 29, 2024

philipc commented Jul 1, 2024

marxin commented Jul 1, 2024

mstange commented Jul 2, 2024

philipc commented Jul 3, 2024

marxin commented Jul 11, 2024

philipc commented Jul 25, 2024

marxin commented Jul 25, 2024

philipc commented Aug 2, 2024

marxin commented Aug 5, 2024

Promote the speed of Gimli-based addr2line #308

Promote the speed of Gimli-based addr2line #308

Comments

marxin commented Jun 29, 2024

philipc commented Jul 1, 2024

marxin commented Jul 1, 2024

mstange commented Jul 2, 2024

philipc commented Jul 3, 2024

marxin commented Jul 11, 2024

philipc commented Jul 25, 2024

marxin commented Jul 25, 2024

philipc commented Aug 2, 2024

marxin commented Aug 5, 2024