Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More granular invalidation #3810

Open
davidmorgan opened this issue Feb 1, 2025 · 10 comments
Open

More granular invalidation #3810

davidmorgan opened this issue Feb 1, 2025 · 10 comments
Assignees
Labels
type-enhancement A request for a change that isn't a bug type-performance

Comments

@davidmorgan
Copy link
Contributor

For generators using the analyzer, it would be good to rerun them less often, based on a finer-grained analysis of what changed.

@davidmorgan
Copy link
Contributor Author

@rrousselGit asked

I would be interested in making a third-party API for builders to optimize recomputation of code-generators.

For that purpose, could we maybe have a way to opt-out to the default "listen to all imported libraries"? Followed by a listen(library)

there are a few parts to that problem :)

The first is whether we are willing to give generators an API that can return incorrect results. A generator can ask the analyzer about any code; if we don't rerun the generator when something changes, then maybe it asked about some transitive dependency and now we are using incorrect output.

In general I think the answer is "we want it to be easy for a generator to be correct", so it's up to build_runner to handle this, not every author of a generator.

I realize that "and it will be slow" is not an acceptable answer here either :)

For deps that are outside the current library cycle, there is already code to do better invalidation, it's used in google3. I'm not sure if it's wired up for build_runner today. The way it works is that the analyzer reports which inputs were unused and build_runner uses that to do better invalidation. I'll have to check.

For deps that are inside the current library cycle, this doesn't work because the analyzer does read all the files in the library cycle so they are inputs. Here is where we would need new code to support finer-grained invalidation. There is definitely interest in making this work.

But, I'll be looking at the raw performance first as there seem to be easy wins there :)

@davidmorgan davidmorgan self-assigned this Feb 1, 2025
@rrousselGit
Copy link

In general I think the answer is "we want it to be easy for a generator to be correct", so it's up to build_runner to handle this, not every author of a generator.

For that purpose, it could be a low-level API that's primarily used by package authors.

If it's obscure enough, it won't get used by the average folks

@davidmorgan
Copy link
Contributor Author

Hmmmmm we could experiment with giving generators access to the "unused inputs" feature.

So the generator would receive a set of the files build_runner thinks it might have used, and the generator can explicitly say "no I didn't use it".

There are some edge cases that are going to be horrible to debug: if a user writes code that introduces a dependency that the generator didn't figure out, then they're going to be staring at a stale build and not know why.

But that does seem much less important than much-too-frequent rebuilds, and it's likely to work just fine for common cases even if the generator is not strictly correct.

This is analogous to how bazel works: it gives every build action the capability to say "here are the inputs I didn't use", https://bazel.build/rules/lib/builtins/actions#parameters_5, the unused_inputs_list. It just trusts the build action, if the action reports wrong unused imports then you just get wrong invalidation.

@davidmorgan davidmorgan added type-performance type-enhancement A request for a change that isn't a bug labels Feb 1, 2025
@rrousselGit
Copy link

We can already report an asset as unused. But we lack the list of currently used assets, so if a generator wants to report an asset as unused, a bunch of work is necessary.

It's generally much easier to report "I used this" than "I didn't use this"

@jakemac53
Copy link
Contributor

The reason we support "I didn't use this" is because that is what bazel supports - and it also maps well to how we run the compilers.

I agree it is a slower and somewhat weird way of doing things though, so we might want to explore something else.

For deps that are outside the current library cycle, there is already code to do better invalidation, it's used in google3. I'm not sure if it's wired up for build_runner today. The way it works is that the analyzer reports which inputs were unused and build_runner uses that to do better invalidation. I'll have to check.

This tracking/reporting is based on access to summary files which were not read. We don't use summaries in build_runner, and it in general doesn't make sense for the build_runner model as it exists today to use them.

We would need a totally different model of dependency tracking for this (but it would be a great if we had this).

@rrousselGit
Copy link

To be clear: Making a different model of dependency-tracking is exactly what I want to work on.

I have a pretty good idea of what I'd think would both:

  • prevent developers from using data without listening to it
  • make devs listen to only what they use

In fact, I'd like a way to be more granular than the library level.

When you have:

@annotation
class A {}

@annotation
class B{}

Changing A shouldn't invalidate B nor anything that depends purely on B.

But if I want to implement that today, afaik I pretty much have to make my own build_runner.

@jakemac53
Copy link
Contributor

We have much lower hanging fruit than trying to do better sub-library granularity invalidation, I think we should first focus on just trying to get whole libraries to not get invalidated when any transitive dep changes.

The main issue right now is actually just the tracking and invalidation when the number of transitive deps is large, since the transitive deps are tracked per input+builder. Trimming that set to immediate deps only in most cases, or even just making the tracking/hashing more efficient (through transitive deps hashes etc), will give significant wins here without compromising correctness.

@rrousselGit
Copy link

I'm just looking for stuff I can do on my own that would be fun too. Controlling cache invalidation with a good API is my kind of thing ; but I don't see a way to give anything to the ecosystem there with how build is architectured.

I'd rather not fork build just for that.

@jakemac53
Copy link
Contributor

I'm just looking for stuff I can do on my own that would be fun too. Controlling cache invalidation with a good API is my kind of thing ; but I don't see a way to give anything to the ecosystem there with how build is architectured.

Fwiw the Resolvers implementation is something that can be swapped out at the build_runner_core level - but yeah you would lose some of the stuff build_runner does if you just build on top of build_runner_core.

We could potentially add some way of plugging in a custom Resolvers implementation into the generated build script - but I don't know exactly what it would look like and its probably not high priority.

@davidmorgan
Copy link
Contributor Author

I'm very much open to changing large things so that they work better, or so contributions are possible.

However it'll take me at least a few weeks to get up to speed, this is the first time I've worked closely in the build_runner codebase, although I've worked with it for a long time :)

So I suggest we revisit the question of what's possible here some time mid February.

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type-enhancement A request for a change that isn't a bug type-performance
Projects
None yet
Development

No branches or pull requests

3 participants