Performance regression on toy problem, but not for opt-level=1. #135241

TheThirdOne · 2025-01-08T09:29:50Z

I tried this code:

fn hanoi_int_inner(n : usize, start: usize, end : usize, aux : usize) -> usize {
    if n == 0 {
        0
    } else {
        let mut out = hanoi_int_inner(n-1, start, aux, end);
        out += 1;
        out += hanoi_int_inner(n-1, aux, end, start);
        out
    }
}

fn hanoi_int(n: usize) -> usize {
    hanoi_int_inner(n,0,1,2)
}

I expected compiling with rustc -C opt-level=3 to be faster than rustc -C opt-level=1. I have not seen any resources online giving a recommendation for opt-level=1 or any downside to more optimization other than binary size.

However, only with opt-level=1 does rustc figure out that start, end, and aux are useless and combine the two recursive calls. Combining the two recursive calls results in linear time complexity vs exponential so it is immediately noticeable even without any tight timing (> minutes vs milliseconds).

Previous versions correctly compile this toy code into non-exponential machine code. It seems to have broken around 1.70 and is currently broken on both stable and nightly.

I am totally willing to investigate this, but I would need some guidance because I have never looked deeply into LLVM's or Rust's optimization passes.

Version it worked on

It most recently worked on rustc --version --verbose:

rustc 1.69.0 (84c898d65 2023-04-16)
binary: rustc
commit-hash: 84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc
commit-date: 2023-04-16
host: x86_64-unknown-linux-gnu
release: 1.69.0
LLVM version: 15.0.7

Version with regression

rustc --version --verbose:

rustc 1.70.0 (90c541806 2023-05-31)
binary: rustc
commit-hash: 90c541806f23a127002de5b4038be731ba1458ca
commit-date: 2023-05-31
host: x86_64-unknown-linux-gnu
release: 1.70.0
LLVM version: 16.0.2

rustc 1.83.0 (90b35a623 2024-11-26)
binary: rustc
commit-hash: 90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf
commit-date: 2024-11-26
host: x86_64-unknown-linux-gnu
release: 1.83.0
LLVM version: 19.1.1

rustc 1.86.0-nightly (ad211ced8 2025-01-07)
binary: rustc
commit-hash: ad211ced81509462cdfe4c29ed10f97279a0acae
commit-date: 2025-01-07
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.6

The text was updated successfully, but these errors were encountered:

nikic · 2025-01-08T09:49:58Z

See https://rust.godbolt.org/z/rYEWWE47r. The difference is that at O1 dead arguments are eliminated first and tail call optimization is performed afterwards. At O3 it's the other way around, and at that point that fact that the values go through connected cyclic phis makes it harder to determine that these are ultimately dead.

TheThirdOne · 2025-01-08T09:58:49Z

Ordering of dead arguments vs tail call would explain the O1 vs O3. It would seem that is enough for LLVM to do the final push to good output.

See https://rust.godbolt.org/z/3Ycszf4Ea. In 1.69 and presumably earlier, rust is essentially doing the entire optimization and emitting a shift instead of the tail call.

apiraino · 2025-01-08T15:18:45Z

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

TheThirdOne added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 8, 2025

rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 8, 2025

jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 8, 2025

rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression on toy problem, but not for opt-level=1. #135241

Performance regression on toy problem, but not for opt-level=1. #135241

TheThirdOne commented Jan 8, 2025

nikic commented Jan 8, 2025

TheThirdOne commented Jan 8, 2025 •

edited

Loading

apiraino commented Jan 8, 2025

Performance regression on toy problem, but not for opt-level=1. #135241

Performance regression on toy problem, but not for opt-level=1. #135241

Comments

TheThirdOne commented Jan 8, 2025

Version it worked on

Version with regression

nikic commented Jan 8, 2025

TheThirdOne commented Jan 8, 2025 • edited Loading

apiraino commented Jan 8, 2025

TheThirdOne commented Jan 8, 2025 •

edited

Loading