Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance regression on toy problem, but not for opt-level=1. #135241

Open
TheThirdOne opened this issue Jan 8, 2025 · 3 comments
Open

Performance regression on toy problem, but not for opt-level=1. #135241

TheThirdOne opened this issue Jan 8, 2025 · 3 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-low Low priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@TheThirdOne
Copy link

I tried this code:

fn hanoi_int_inner(n : usize, start: usize, end : usize, aux : usize) -> usize {
    if n == 0 {
        0
    } else {
        let mut out = hanoi_int_inner(n-1, start, aux, end);
        out += 1;
        out += hanoi_int_inner(n-1, aux, end, start);
        out
    }
}

fn hanoi_int(n: usize) -> usize {
    hanoi_int_inner(n,0,1,2)
} 

I expected compiling with rustc -C opt-level=3 to be faster than rustc -C opt-level=1. I have not seen any resources online giving a recommendation for opt-level=1 or any downside to more optimization other than binary size.

However, only with opt-level=1 does rustc figure out that start, end, and aux are useless and combine the two recursive calls. Combining the two recursive calls results in linear time complexity vs exponential so it is immediately noticeable even without any tight timing (> minutes vs milliseconds).

Previous versions correctly compile this toy code into non-exponential machine code. It seems to have broken around 1.70 and is currently broken on both stable and nightly.

I am totally willing to investigate this, but I would need some guidance because I have never looked deeply into LLVM's or Rust's optimization passes.

Version it worked on

It most recently worked on rustc --version --verbose:

rustc 1.69.0 (84c898d65 2023-04-16)
binary: rustc
commit-hash: 84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc
commit-date: 2023-04-16
host: x86_64-unknown-linux-gnu
release: 1.69.0
LLVM version: 15.0.7

Version with regression

rustc --version --verbose:

rustc 1.70.0 (90c541806 2023-05-31)
binary: rustc
commit-hash: 90c541806f23a127002de5b4038be731ba1458ca
commit-date: 2023-05-31
host: x86_64-unknown-linux-gnu
release: 1.70.0
LLVM version: 16.0.2

rustc 1.83.0 (90b35a623 2024-11-26)
binary: rustc
commit-hash: 90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf
commit-date: 2024-11-26
host: x86_64-unknown-linux-gnu
release: 1.83.0
LLVM version: 19.1.1

rustc 1.86.0-nightly (ad211ced8 2025-01-07)
binary: rustc
commit-hash: ad211ced81509462cdfe4c29ed10f97279a0acae
commit-date: 2025-01-07
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.6
@TheThirdOne TheThirdOne added C-bug Category: This is a bug. regression-untriaged Untriaged performance or correctness regression. labels Jan 8, 2025
@rustbot rustbot added I-prioritize Issue: Indicates that prioritization has been requested for this issue. needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 8, 2025
@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such and removed C-bug Category: This is a bug. labels Jan 8, 2025
@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jan 8, 2025
@nikic
Copy link
Contributor

nikic commented Jan 8, 2025

See https://rust.godbolt.org/z/rYEWWE47r. The difference is that at O1 dead arguments are eliminated first and tail call optimization is performed afterwards. At O3 it's the other way around, and at that point that fact that the values go through connected cyclic phis makes it harder to determine that these are ultimately dead.

@TheThirdOne
Copy link
Author

TheThirdOne commented Jan 8, 2025

Ordering of dead arguments vs tail call would explain the O1 vs O3. It would seem that is enough for LLVM to do the final push to good output.

See https://rust.godbolt.org/z/3Ycszf4Ea. In 1.69 and presumably earlier, rust is essentially doing the entire optimization and emitting a shift instead of the tail call.

@apiraino
Copy link
Contributor

apiraino commented Jan 8, 2025

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-low

@rustbot rustbot added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. P-low Low priority regression-untriaged Untriaged performance or correctness regression. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants