-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lto to static builds #2147
Add lto to static builds #2147
Conversation
|
Branch | lto |
Testbed | ubuntu-latest |
Click to view all benchmark results
Benchmark | Latency | microseconds (µs) |
---|---|---|
fibonacci 10 | 📈 view plot 🚷 view threshold | 422.83 |
foldl arrays 50 | 📈 view plot 🚷 view threshold | 1,631.60 |
foldl arrays 500 | 📈 view plot 🚷 view threshold | 6,156.30 |
foldr strings 50 | 📈 view plot 🚷 view threshold | 6,094.60 |
foldr strings 500 | 📈 view plot 🚷 view threshold | 52,438.00 |
generate normal 250 | 📈 view plot 🚷 view threshold | 42,386.00 |
generate normal 50 | 📈 view plot 🚷 view threshold | 1,911.60 |
generate normal unchecked 1000 | 📈 view plot 🚷 view threshold | 2,987.70 |
generate normal unchecked 200 | 📈 view plot 🚷 view threshold | 684.20 |
pidigits 100 | 📈 view plot 🚷 view threshold | 2,903.20 |
pipe normal 20 | 📈 view plot 🚷 view threshold | 1,346.70 |
pipe normal 200 | 📈 view plot 🚷 view threshold | 8,770.30 |
product 30 | 📈 view plot 🚷 view threshold | 786.69 |
scalar 10 | 📈 view plot 🚷 view threshold | 1,385.70 |
sum 30 | 📈 view plot 🚷 view threshold | 772.36 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm counting right, we have three main ways of providing Nickel binaries
- pre-compiled binaries (handled here)
- binary compiled from source when using the crates.io release
- building Nickel from Nix/Nixpkgs
Do you think we could cheaply get LTO for the other ones? I haven't looked, but can we specify a profile to use when cargo install
ing something, which might be different from release?
The nix part might need to add some knobs to avoid doing LTO on the CI but do it for the default package/default app of the flake. Then I suppose it needs to be done differently for the Nixpkgs release, but this can be handled at the next release - there is nothing we can do from the Nickel repo for now, I reckon.
I'm obviously lacking a ton of context here, so bear with me. I'm surprised that the release profile is used on CI, I would intuitively run CI with the development profile. But if you have a good reason to use a release-like profile for CI, I think the more natural solution is to create a new profile for CI with the well-tuned option, and have the release profile (which will be used by package manager) turn LTO on. You're not proposing this, so: am I missing something? |
Definitely the easiest route would be to add LTO to the release profile, and then everyone would get it automatically. I'm not 100% sure that it's what we want for For nixpkgs, I guess we would either add LTO to the release profile or patch nixpkgs to use a release-lto profile.
I think this is just because the CI uses |
I see. Is the flake building purely development/CI builds not an option? Is the flake a distribution channel as well? But if it is, then you probably want LTO turned on for the flake as well, don't you? |
Personally, I use the flake only for its devShell (and indirectly for CI). Since we make the flake publicly available, I guess anyone could be using it for getting a nickel binary. I think organist was using our flake at some point, but currently they're using nickel from nixpkgs instead. Even if the flake was a distribution channel, I'm not sure we'd want to turn on LTO by default. It's the same logic as "cargo install", because the end-user is the one doing the compiling. One thing that makes the decision tricky is that going from debug to release builds roughly doubles the build time but delivers a huge performance benefit, while adding on LTO doubles the build time again but for a much smaller benefit. |
Are we able to estimate the performance benefit of LTO? As a rule of thumb, I think that when installing a program (as opposed to “just” compiling), you want to best build (that is the recommended build). Build time doesn't matter that much. In the case of the flake, it could also be mitigated by providing a cache. |
I'm not sure how representative it is, but we do have a benchmark suite. On my machine, LTO delivers about a 10% improvement. Maybe that's substantial enough to just live with the increased build times? |
I would say this is what we want. When you
This is a good point. To be honest I'm not a hundred percent sure about the answer. One might just be that it was easier to do in the flake (instead of parametrizing the profile), as we share the same sub-derivations e.g. to build the default package of the flake. It's definitely a bad reason because it's not very hard to parametrize. Another possibility is caching. It's maybe moot now that we use buildxx for the CI, but before that, hitting the cache was quite important to get reasonable CI time, and it might be the case that building with different profiles would fill it up more quickly (it's easy to blow up a free Cachix instance). I remember we were quite careful regarding those questions when developing the flake at a time. Since we use flake checks, could it be the case that a random user installing from the flake will build everything in double, as I believe checks are run by default? But those are suppositions. It also makes a lot of sense to build stuff in debug mode in CI. Maybe a reasonable answer is that flake checks and CI checks should be separate, or at least not necessarily perfectly equal. |
I only ever run |
Sure, but would that be a problem to run |
Sure, that would be fine for me. But as someone who only ever runs |
I think I had a wrong understanding of |
Ok, this latest version turns on LTO for the release profile, and then hopefully modifies With any luck, I also fixed the caching issues causing cargo to rebuild many of the dependencies. We now build and cache dependencies with debug and release profiles, and with/without |
Fixes #2116.
I timed (clean) lto vs non-lto builds, and on my system the lto builds are almost twice as slow (180-ish seconds vs 100-ish). For that reason, I didn't add lto to the release profile, but created a new one with lto. I switched the static builds to use lto, so that release binaries will get the benefit but our other CI builds won't get slowed down.
I also measured
lto = "thin"
, but it increased the size of the main binary (I didn't run benchmarks, so I don't know the effect on runtime performance).