Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof-of-Concept: Fast path for search #9

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

larry-the-table-guy
Copy link

@larry-the-table-guy larry-the-table-guy commented Nov 16, 2024

TL;DR: In practice, this just reduces CPU usage slightly for typical datasets.

Main idea in this PR is that most files can be rejected by just looking at the first few bytes (it's in the server's own interest to not break this pattern).
For the sake of keeping this PR small, I assumed p1team appears after p1, p2. This could be done in a less hacky manner by searching for "p1": and "p2":, then parsing out the string values.

The overwhelming bottleneck is how the battle logs are stored - many small JSON files. Storage devices are just very bad at small random reads; even SSDs much prefer long sequential reads.

Anyway, this was a quick thing I thought I'd share, maybe it gives some ideas for future problems.

cargo bench stats

This is slower on files with matching player IDs, and faster on files that don't match (the common case).
I added a bench case to help demonstrate that.
Most of the overhead added to the former case could be avoided by reorganizing the code so that the file is only opened once, but that's an invasive change and I just wanted to demonstrate the core idea here.

Before

$ cargo bench "search"
    Finished `bench` profile [optimized] target(s) in 0.03s
     Running unittests src/main.rs (target/release/deps/psbattletools-7147a87a14c28c1d)

running 6 tests
test search::unit_tests::bench_handle_directory_1k                ... bench:   2,522,612.22 ns/iter (+/- 955,223.83)
test search::unit_tests::bench_handle_directory_1k_absent         ... bench:   1,827,392.32 ns/iter (+/- 93,840.33)
test search::unit_tests::bench_parse                              ... bench:       3,876.60 ns/iter (+/- 133.43)
test search::unit_tests::bench_parse_forfeit_wins_only            ... bench:         896.47 ns/iter (+/- 20.42)
test search::unit_tests::bench_parse_forfeits_only                ... bench:       3,475.20 ns/iter (+/- 227.75)
test search::unit_tests::bench_parse_wins_only                    ... bench:         896.90 ns/iter (+/- 30.81)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 14 filtered out; finished in 11.14s

     Running benches/integration.rs (target/release/deps/integration-020071ea7b7992b0)

running 2 tests
test bench_old_battlesearch_1k ... ignored
test bench_search_1k           ... bench:   5,855,477.70 ns/iter (+/- 1,518,084.37)

test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 4 filtered out; finished in 5.40s

After

$ cargo bench "search"
   Compiling psbattletools v0.2.4 (/home/_/code/psbattletools)
    Finished `bench` profile [optimized] target(s) in 1.62s
     Running unittests src/main.rs (target/release/deps/psbattletools-7147a87a14c28c1d)

running 6 tests
test search::unit_tests::bench_handle_directory_1k                ... bench:   3,275,835.15 ns/iter (+/- 258,127.03)
test search::unit_tests::bench_handle_directory_1k_absent         ... bench:   1,495,995.55 ns/iter (+/- 76,207.67)
test search::unit_tests::bench_parse                              ... bench:       3,867.18 ns/iter (+/- 296.49)
test search::unit_tests::bench_parse_forfeit_wins_only            ... bench:         735.51 ns/iter (+/- 78.50)
test search::unit_tests::bench_parse_forfeits_only                ... bench:       3,196.23 ns/iter (+/- 441.67)
test search::unit_tests::bench_parse_wins_only                    ... bench:         739.44 ns/iter (+/- 25.11)

test result: ok. 0 passed; 0 failed; 0 ignored; 6 measured; 14 filtered out; finished in 11.17s

     Running benches/integration.rs (target/release/deps/integration-020071ea7b7992b0)

running 2 tests
test bench_old_battlesearch_1k ... ignored
test bench_search_1k           ... bench:   6,041,682.95 ns/iter (+/- 1,104,569.47)

test result: ok. 0 passed; 0 failed; 1 ignored; 1 measured; 4 filtered out; finished in 5.65s

perf stats

Constructed a directory with 500K files, 8GB total.

Uncached

The target usecase.
Dominated by FS and disk seeks.

echo 3 | sudo tee /proc/sys/vm/drop_caches to clear the OS's file cache between each run. My drive doesn't have its own cache.

Before

$perf stat -- psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for 'psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

         18,384.52 msec task-clock:u                     #    0.222 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,486      page-faults:u                    #  624.765 /sec                      
     8,997,947,822      cycles:u                         #    0.489 GHz                       
     9,440,432,842      instructions:u                   #    1.05  insn per cycle            
     2,280,047,807      branches:u                       #  124.020 M/sec                     
        24,748,638      branch-misses:u                  #    1.09% of all branches           

      82.965860922 seconds time elapsed

       3.196036000 seconds user
      15.260381000 seconds sys

After

$perf stat -- ./target/release/psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for './target/release/psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

         15,295.64 msec task-clock:u                     #    0.182 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,374      page-faults:u                    #  743.611 /sec                      
     5,439,202,925      cycles:u                         #    0.356 GHz                       
     3,971,634,215      instructions:u                   #    0.73  insn per cycle            
       795,169,931      branches:u                       #   51.987 M/sec                     
        10,860,116      branch-misses:u                  #    1.37% of all branches           

      84.188491028 seconds time elapsed

       1.957654000 seconds user
      13.414479000 seconds sys

Cached

Not a realistic scenario, but better isolates the CPU-intensive portion.

Before

$perf stat -- psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for 'psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

          4,400.00 msec task-clock:u                     #    2.506 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,468      page-faults:u                    #    2.606 K/sec                     
     3,779,035,315      cycles:u                         #    0.859 GHz                       
     9,440,387,186      instructions:u                   #    2.50  insn per cycle            
     2,280,039,028      branches:u                       #  518.191 M/sec                     
         4,516,395      branch-misses:u                  #    0.20% of all branches           

       1.756031579 seconds time elapsed

       1.683201000 seconds user
       2.716097000 seconds sys

After

$perf stat -- ./target/release/psbattletools --threads 3 search alice ~/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/
Parsing /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/...

 Performance counter stats for './target/release/psbattletools --threads 3 search alice /home/_/code/pokemon-showdown/logs/2024-11/gen1randombattle/2024-11-10/':

          2,681.52 msec task-clock:u                     #    2.560 CPUs utilized             
                 0      context-switches:u               #    0.000 /sec                      
                 0      cpu-migrations:u                 #    0.000 /sec                      
            11,358      page-faults:u                    #    4.236 K/sec                     
     2,316,709,912      cycles:u                         #    0.864 GHz                       
     4,462,151,111      instructions:u                   #    1.93  insn per cycle            
       899,173,185      branches:u                       #  335.322 M/sec                     
         2,276,173      branch-misses:u                  #    0.25% of all branches           

       1.047270698 seconds time elapsed

       1.054629000 seconds user
       1.626428000 seconds sys

Load small prefix of file and attempt to use that to reject files
Would behave incorrectly if a string ended with a comma
Also fix 2 clippy warnings for redundant refs
In its infinite wisdom, clippy won't let me push my changes until this unrelated lint is fixed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant