Proof-of-Concept: Fast path for search #9

larry-the-table-guy · 2024-11-16T16:50:13Z

TL;DR: In practice, this just reduces CPU usage slightly for typical datasets.

Main idea in this PR is that most files can be rejected by just looking at the first few bytes (it's in the server's own interest to not break this pattern).
For the sake of keeping this PR small, I assumed p1team appears after p1, p2. This could be done in a less hacky manner by searching for "p1": and "p2":, then parsing out the string values.

The overwhelming bottleneck is how the battle logs are stored - many small JSON files. Storage devices are just very bad at small random reads; even SSDs much prefer long sequential reads.

Anyway, this was a quick thing I thought I'd share, maybe it gives some ideas for future problems.

`cargo bench` stats

This is slower on files with matching player IDs, and faster on files that don't match (the common case).
I added a bench case to help demonstrate that.
Most of the overhead added to the former case could be avoided by reorganizing the code so that the file is only opened once, but that's an invasive change and I just wanted to demonstrate the core idea here.

Before

$ cargo bench "search"
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running unittests src/main.rs (target/release/deps/psbattletools-7147a87a14c28c1d)

running 6 tests
test search::unit_tests::bench_handle_directory_1k                ... bench:   2,522,612.22 ns/iter (+/- 955,223.83)
test search::unit_tests::bench_handle_directory_1k_absent         ... bench:   1,827,392.32 ns/iter (+/- 93,840.33)
test search::unit_tests::bench_parse                              ... bench:       3,876.60 ns/iter (+/- 133.43)
test search::unit_tests::bench_parse_forfeit_wins_only            ... bench:         896.47 ns/iter (+/- 20.42)
test search::unit_tests::bench_parse_forfeits_only                ... bench:       3,475.20 ns/iter (+/- 227.75)
test search::unit_tests::bench_parse_wins_only                    ... bench:         896.90 ns/iter (+/- 30.81)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 14 filtered out; finished in 11.14s

     Running benches/integration.rs (target/release/deps/integration-020071ea7b7992b0)

running 2 tests
test bench_old_battlesearch_1k ... ignored
test bench_search_1k           ... bench:   5,855,477.70 ns/iter (+/- 1,518,084.37)

test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 4 filtered out; finished in 5.40s

After

$ cargo bench "search"
   Compiling psbattletools v0.2.4 (/home/_/code/psbattletools)
    Finished `bench` profile [optimized] target(s) in 1.62s
     Running unittests src/main.rs (target/release/deps/psbattletools-7147a87a14c28c1d)

running 6 tests
test search::unit_tests::bench_handle_directory_1k                ... bench:   3,275,835.15 ns/iter (+/- 258,127.03)
test search::unit_tests::bench_handle_directory_1k_absent         ... bench:   1,495,995.55 ns/iter (+/- 76,207.67)
test search::unit_tests::bench_parse                              ... bench:       3,867.18 ns/iter (+/- 296.49)
test search::unit_tests::bench_parse_forfeit_wins_only            ... bench:         735.51 ns/iter (+/- 78.50)
test search::unit_tests::bench_parse_forfeits_only                ... bench:       3,196.23 ns/iter (+/- 441.67)
test search::unit_tests::bench_parse_wins_only                    ... bench:         739.44 ns/iter (+/- 25.11)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 14 filtered out; finished in 11.17s

     Running benches/integration.rs (target/release/deps/integration-020071ea7b7992b0)

running 2 tests
test bench_old_battlesearch_1k ... ignored
test bench_search_1k           ... bench:   6,041,682.95 ns/iter (+/- 1,104,569.47)

test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 4 filtered out; finished in 5.65s

`perf` stats

Constructed a directory with 500K files, 8GB total.

Uncached

The target usecase.
Dominated by FS and disk seeks.

echo 3 | sudo tee /proc/sys/vm/drop_caches to clear the OS's file cache between each run. My drive doesn't have its own cache.

Before

$perf stat -- psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for 'psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

         18,384.52 msec task-clock:u                     #    0.222 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,486      page-faults:u                    #  624.765 /sec                      
     8,997,947,822      cycles:u                         #    0.489 GHz                       
     9,440,432,842      instructions:u                   #    1.05  insn per cycle            
     2,280,047,807      branches:u                       #  124.020 M/sec                     
        24,748,638      branch-misses:u                  #    1.09% of all branches           

      82.965860922 seconds time elapsed

       3.196036000 seconds user
      15.260381000 seconds sys

After

$perf stat -- ./target/release/psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for './target/release/psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

         15,295.64 msec task-clock:u                     #    0.182 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,374      page-faults:u                    #  743.611 /sec                      
     5,439,202,925      cycles:u                         #    0.356 GHz                       
     3,971,634,215      instructions:u                   #    0.73  insn per cycle            
       795,169,931      branches:u                       #   51.987 M/sec                     
        10,860,116      branch-misses:u                  #    1.37% of all branches           

      84.188491028 seconds time elapsed

       1.957654000 seconds user
      13.414479000 seconds sys

Cached

Not a realistic scenario, but better isolates the CPU-intensive portion.

Before

$perf stat -- psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for 'psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

          4,400.00 msec task-clock:u                     #    2.506 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,468      page-faults:u                    #    2.606 K/sec                     
     3,779,035,315      cycles:u                         #    0.859 GHz                       
     9,440,387,186      instructions:u                   #    2.50  insn per cycle            
     2,280,039,028      branches:u                       #  518.191 M/sec                     
         4,516,395      branch-misses:u                  #    0.20% of all branches           

       1.756031579 seconds time elapsed

       1.683201000 seconds user
       2.716097000 seconds sys

After

$perf stat -- ./target/release/psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for './target/release/psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

          2,681.52 msec task-clock:u                     #    2.560 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,358      page-faults:u                    #    4.236 K/sec                     
     2,316,709,912      cycles:u                         #    0.864 GHz                       
     4,462,151,111      instructions:u                   #    1.93  insn per cycle            
       899,173,185      branches:u                       #  335.322 M/sec                     
         2,276,173      branch-misses:u                  #    0.25% of all branches           

       1.047270698 seconds time elapsed

       1.054629000 seconds user
       1.626428000 seconds sys

Load small prefix of file and attempt to use that to reject files

Would behave incorrectly if a string ended with a comma

Also fix 2 clippy warnings for redundant refs

In its infinite wisdom, clippy won't let me push my changes until this unrelated lint is fixed.

larry-the-table-guy added 4 commits November 16, 2024 10:11

Add fast path for search

6ab3bb1

Load small prefix of file and attempt to use that to reject files

Fix edge case in fast path's JSON fixup

a0e9792

Would behave incorrectly if a string ended with a comma

Add bench case for search w/ no matches

9a8f0da

Also fix 2 clippy warnings for redundant refs

Fix clippy lint for needless return

b80f040

In its infinite wisdom, clippy won't let me push my changes until this unrelated lint is fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proof-of-Concept: Fast path for search #9

Proof-of-Concept: Fast path for search #9

larry-the-table-guy commented Nov 16, 2024 •

edited

Loading

Proof-of-Concept: Fast path for search #9

Are you sure you want to change the base?

Proof-of-Concept: Fast path for search #9

Conversation

larry-the-table-guy commented Nov 16, 2024 • edited Loading

cargo bench stats

Before

After

perf stats

Uncached

Before

After

Cached

Before

After

larry-the-table-guy commented Nov 16, 2024 •

edited

Loading

`cargo bench` stats

`perf` stats