Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix linker-plugin-lto only doing thin lto #136840

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Flakebi
Copy link
Contributor

@Flakebi Flakebi commented Feb 10, 2025

When rust provides LLVM bitcode files to lld and the bitcode contains function summaries as used for thin lto, lld defaults to using thin lto. This prevents some optimizations that are only applied for fat lto. I ran into this with the amdgpu backend and was able to create a test there. Unfortunately, I wasn’t able to recreate the same test on x86, the difference between thin and fat lto seems less pronounced there.

The important part of the change is setting the ThinLTO=0 module flag. The rest of the changes are fixing the <function>.kd symbol getting exported for amdhsa with linker-plugin-lto.

Tracking issue: #135024

r? @workingjubilee, as you’ve been reviewing most other amdgpu patches, not sure if there should be other reviewers for lto.

When rust provides LLVM bitcode files to lld and the bitcode contains
function summaries as used for thin lto, lld defaults to using thin lto.
This prevents some optimizations that are only applied for fat lto. I
ran into this with the amdgpu backend and was able to create a test
there. Unfortunately, I wasn’t able to recreate the same test on x86,
the difference between thin and fat lto seems less pronounced there.

The important part of the change is setting the `ThinLTO=0` module flag.
The rest of the changes are fixing the `<function>.kd` symbol getting
exported for amdhsa with linker-plugin-lto.
@rustbot
Copy link
Collaborator

rustbot commented Feb 10, 2025

Could not assign reviewer from: workingjubilee.
User(s) workingjubilee are either the PR author, already assigned, or on vacation. Please use r? to specify someone else to assign.

@rustbot
Copy link
Collaborator

rustbot commented Feb 10, 2025

r? @jieyouxu

rustbot has assigned @jieyouxu.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 10, 2025
@rustbot
Copy link
Collaborator

rustbot commented Feb 10, 2025

This PR modifies tests/run-make/. If this PR is trying to port a Makefile
run-make test to use rmake.rs, please update the
run-make port tracking issue
so we can track our progress. You can either modify the tracking issue
directly, or you can comment on the tracking issue and link this PR.

cc @jieyouxu

@Flakebi Flakebi mentioned this pull request Feb 10, 2025
16 tasks
@workingjubilee
Copy link
Member

I have barely any idea about LTO besides "it happens and it involves dlopening a compiler and shoving its serialized data back in it" tbh soo

@jieyouxu
Copy link
Member

Unfortunately I have no clue either, so

r? compiler

@rustbot rustbot assigned fee1-dead and unassigned jieyouxu Feb 11, 2025
@Flakebi
Copy link
Contributor Author

Flakebi commented Feb 11, 2025

For reference, the code that switches to thin lto when the flag is not set is here: https://github.com/llvm/llvm-project/blob/e258bca9505f35e0a22cb213a305eea9b76d11ea/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp#L4441-L4446

  // By default we compile with ThinLTO if the module has a summary, but the
  // client can request full LTO with a module flag.
  bool IsThinLTO = true;
  if (auto *MD =
          mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("ThinLTO")))
    IsThinLTO = MD->getZExtValue();

The code in clang that sets the flag, which is replicated here for Rust is here: https://github.com/llvm/llvm-project/blob/560149b5e3c891c64899e9912e29467a69dc3a4c/clang/lib/CodeGen/BackendUtil.cpp#L1150

        if (!TheModule->getModuleFlag("ThinLTO") && !CodeGenOpts.UnifiedLTO)
          TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO", uint32_t(0));

// Disable ThinLTO if fat lto is requested. Otherwise lld defaults to thin lto.
if sess.lto() == config::Lto::Fat {
llvm::add_module_flag_u32(llmod, llvm::ModuleFlagMergeBehavior::Override, "ThinLTO", 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if a dependency is built with lto=true (aka lto=fat), but then the user wants to use thinLTO? I'm pretty sure the standard library is built with lto=true for example, but that shouldn't prevent thinLTO from ever working.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, it seems to change somewhat, but still work in general. I added a test for this.
What changes: Without this change, the test passes when
lib is compiled with O0 and main with O3 and

  1. lib uses lto=thin and main uses lto=thin
  2. lib uses lto=thin and main uses lto=fat
  3. lib uses lto=fat and main uses lto=thin
  4. lib uses lto=fat and main uses lto=fat

With this change, all of these keep passing except for case 3 (lib uses lto=fat and main uses lto=thin).
When lib is compiled with O1, O2 or O3, case 3 passes as well.
I assume this is the important case, as the standard library is compiled with optimizations.
(And lto with O0 is kinda questionable, except maybe for nvptx and amdgpu, but they require lto=fat anyway.)

let dylib_pgo = crate_type == CrateType::Dylib || sess.opts.cg.profile_generate.enabled();
// When compiling for amdhsa, every kernel function generates a <function name>.kd symbol.
// This symbol gets removed again when using linker-plugin-lto. Disable gc_sections to keep
// the symbol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the version script should list those symbols instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The symbol is in the version script with #135909. I’m not 100% sure why it doesn’t work with linker-plugin-lto (putting it in the version script works with normal lto). My guess is that it’s the same problem as with the defined symbols, lld doesn’t see the symbol defined at the start, because the backend hasn’t created it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into lld to be sure and it seems to be indeed the case that the <kernel>.kd symbol is removed with --gc-sections because it is not defined at the beginning when doing full linker lto.

For future reference, this is what happens in more details:

  1. SymbolTable::scanVersionScript looks at entries in the linker script and calls the assignExact lambda. It ignores symbols that are not defined (unless undefined symbols are forbidden, then it aborts).
  2. elf::parseVersionAndComputeIsPreemptible sets isExported of existing symbols to true.
  3. Now bitcode inputs get compiled, this generates <kernel>.kd symbols for amdhsa.
  4. MarkLive<ELFT>::run checks isExported and then marks these symbols as isLive.
  5. demoteSymbolsAndComputeIsPreemptible demotes symbols that are not marked as isLive. This makes the symbol undefined.
  6. Later on, in Add symbols to symtabs, undefined symbols are skipped

Not passing --gc-sections changes elf::markLive (with gc, this calls MarkLive<ELFT>::run) to mark all sections as needed and this skips further live checks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be this issue, I’ll mention it in the comment: llvm/llvm-project#119479

@rust-log-analyzer

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-run-make Area: port run-make Makefiles to rmake.rs S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants