-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix linker-plugin-lto only doing thin lto #136840
base: master
Are you sure you want to change the base?
Conversation
When rust provides LLVM bitcode files to lld and the bitcode contains function summaries as used for thin lto, lld defaults to using thin lto. This prevents some optimizations that are only applied for fat lto. I ran into this with the amdgpu backend and was able to create a test there. Unfortunately, I wasn’t able to recreate the same test on x86, the difference between thin and fat lto seems less pronounced there. The important part of the change is setting the `ThinLTO=0` module flag. The rest of the changes are fixing the `<function>.kd` symbol getting exported for amdhsa with linker-plugin-lto.
Could not assign reviewer from: |
This PR modifies cc @jieyouxu |
I have barely any idea about LTO besides "it happens and it involves dlopening a compiler and shoving its serialized data back in it" tbh soo |
Unfortunately I have no clue either, so r? compiler |
For reference, the code that switches to thin lto when the flag is not set is here: https://github.com/llvm/llvm-project/blob/e258bca9505f35e0a22cb213a305eea9b76d11ea/llvm/lib/Bitcode/Writer/BitcodeWriter.cpp#L4441-L4446 // By default we compile with ThinLTO if the module has a summary, but the
// client can request full LTO with a module flag.
bool IsThinLTO = true;
if (auto *MD =
mdconst::extract_or_null<ConstantInt>(M.getModuleFlag("ThinLTO")))
IsThinLTO = MD->getZExtValue(); The code in clang that sets the flag, which is replicated here for Rust is here: https://github.com/llvm/llvm-project/blob/560149b5e3c891c64899e9912e29467a69dc3a4c/clang/lib/CodeGen/BackendUtil.cpp#L1150 if (!TheModule->getModuleFlag("ThinLTO") && !CodeGenOpts.UnifiedLTO)
TheModule->addModuleFlag(llvm::Module::Error, "ThinLTO", uint32_t(0)); |
// Disable ThinLTO if fat lto is requested. Otherwise lld defaults to thin lto. | ||
if sess.lto() == config::Lto::Fat { | ||
llvm::add_module_flag_u32(llmod, llvm::ModuleFlagMergeBehavior::Override, "ThinLTO", 0); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a dependency is built with lto=true (aka lto=fat), but then the user wants to use thinLTO? I'm pretty sure the standard library is built with lto=true for example, but that shouldn't prevent thinLTO from ever working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, it seems to change somewhat, but still work in general. I added a test for this.
What changes: Without this change, the test passes when
lib is compiled with O0
and main with O3
and
- lib uses lto=thin and main uses lto=thin
- lib uses lto=thin and main uses lto=fat
- lib uses lto=fat and main uses lto=thin
- lib uses lto=fat and main uses lto=fat
With this change, all of these keep passing except for case 3 (lib uses lto=fat and main uses lto=thin).
When lib is compiled with O1
, O2
or O3
, case 3 passes as well.
I assume this is the important case, as the standard library is compiled with optimizations.
(And lto with O0
is kinda questionable, except maybe for nvptx and amdgpu, but they require lto=fat anyway.)
let dylib_pgo = crate_type == CrateType::Dylib || sess.opts.cg.profile_generate.enabled(); | ||
// When compiling for amdhsa, every kernel function generates a <function name>.kd symbol. | ||
// This symbol gets removed again when using linker-plugin-lto. Disable gc_sections to keep | ||
// the symbol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the version script should list those symbols instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The symbol is in the version script with #135909. I’m not 100% sure why it doesn’t work with linker-plugin-lto (putting it in the version script works with normal lto). My guess is that it’s the same problem as with the defined symbols, lld doesn’t see the symbol defined at the start, because the backend hasn’t created it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into lld to be sure and it seems to be indeed the case that the <kernel>.kd
symbol is removed with --gc-sections
because it is not defined at the beginning when doing full linker lto.
For future reference, this is what happens in more details:
SymbolTable::scanVersionScript
looks at entries in the linker script and calls theassignExact
lambda. It ignores symbols that are not defined (unless undefined symbols are forbidden, then it aborts).elf::parseVersionAndComputeIsPreemptible
setsisExported
of existing symbols totrue
.- Now bitcode inputs get compiled, this generates
<kernel>.kd
symbols for amdhsa. MarkLive<ELFT>::run
checksisExported
and then marks these symbols asisLive
.demoteSymbolsAndComputeIsPreemptible
demotes symbols that are not marked asisLive
. This makes the symbol undefined.- Later on, in
Add symbols to symtabs
, undefined symbols are skipped
Not passing --gc-sections
changes elf::markLive
(with gc, this calls MarkLive<ELFT>::run
) to mark all sections as needed and this skips further live checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to be this issue, I’ll mention it in the comment: llvm/llvm-project#119479
When rust provides LLVM bitcode files to lld and the bitcode contains function summaries as used for thin lto, lld defaults to using thin lto. This prevents some optimizations that are only applied for fat lto. I ran into this with the amdgpu backend and was able to create a test there. Unfortunately, I wasn’t able to recreate the same test on x86, the difference between thin and fat lto seems less pronounced there.
The important part of the change is setting the
ThinLTO=0
module flag. The rest of the changes are fixing the<function>.kd
symbol getting exported for amdhsa with linker-plugin-lto.Tracking issue: #135024
r? @workingjubilee, as you’ve been reviewing most other amdgpu patches, not sure if there should be other reviewers for lto.