Skip to content

Commit

Permalink
Copy edits (#111)
Browse files Browse the repository at this point in the history
Mostly make language more direct and active, and omit needless words.
  • Loading branch information
elharo authored Nov 21, 2023
1 parent 25167ec commit d7e9bf1
Showing 1 changed file with 27 additions and 29 deletions.
56 changes: 27 additions & 29 deletions src/site/markdown/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,52 +17,50 @@

## Overview

Build cache is an extension targeted to simplify and make more efficient work with large builds in Maven.
Build cache is an extension that makes large Maven builds more efficient.

A combination of features achieves that:

* Incremental builds works on the modified part of the project graph part only
* Subtree support for multimodule projects to work on the part of the codebase in isolation
* Version normalization to support project version agnostic caches
* Project state restoration (partial) to avoid expensive tasks (code generation and similar)
* Incremental builds work on the modified part of the project graph part only
* Subtree support for multimodule projects builds part of the codebase in isolation
* Version normalization supports project version agnostic caches
* Project state restoration (partial) avoids repeating expensive tasks like code generation

Large projects usually pose scalability challenges, and working with such projects requires a build tool that scales.
Large projects pose scalability challenges, and working with such projects requires a build tool that scales.
The cache
extension addresses that with incremental build execution and the ability to efficiently work on sub-parts of a larger
project without building and installing dependencies from the larger project. Though, features implemented in Maven
should also give noticeable benefits in medium and small-sized projects.
project without building and installing dependencies from the larger project.

### Cache concepts

The idea of the build cache is to calculate a key from module inputs, store outputs in the cache, and restore them later
transparently to the standard Maven core. The cache deterministically associates each project state with a unique key
The build cache calculates a key from module inputs, stores outputs in the cache, and transparently restores them later to the standard Maven core. The cache associates each project state with a unique key
and restores it in subsequent builds. It analyzes source code, project model,
plugins, and their parameters. Projects with the same key are up-to-date (not changed) and could be safely restored from
the cache. Projects producing different keys are out-of-date (changed), and the cache fully rebuilds them. In the latter
plugins, and their parameters. Projects with the same key are up-to-date (not changed) and can be restored from
the cache. Projects that prodiuce different keys are out-of-date (changed), and the cache fully rebuilds them. In the latter
case, the cache does not make any
interventions to the build execution logic and delegates build work to the standard maven Maven core. This approach
interventions in the build execution logic and delegates build work to the standard maven Maven core. This approach
ensures that
artifacts produced in the presence of a cache are equivalent to the result produced by a standard Maven build.
To achieve an accurate key calculation, the build-cache extension combines automatic introspection
of [project object model](https://maven.apache.org/pom.html#What_is_the_POM) and fine-grained tuning using
a configuration file. Source code content fingerprinting is digests based, which is more reliable over
widely used file timestamps in tools like Make or Apache Ant. Cache outputs could be shared using a remote cache.
a configuration file. Source code content fingerprinting is digest based, which is more reliable than
the file timestamps used in tools like Make or Apache Ant. Cache outputs can be shared using a remote cache.
Deterministic inputs calculation allows distributed and parallel builds running in heterogeneous environments (like a
cloud of build agents) efficiently reuse cached build artifacts as soon as they are published. Therefore, incremental
Maven is particularly well-suited for large Maven
projects that have a significant number of small modules. Remote cache, combined with relocatable inputs
The build cache is particularly useful for large Maven
projects that have a significant number of small modules. Remote caching, combined with relocatable inputs
identification, effectively enables the "change once - build once" approach across all environments.

### Maven insights

Maven is a proven tool with a long history and core design established many years ago. Historically, Maven's core was
designed with generic stable interfaces that don't have a concept of inputs and outputs. It just runs as configured, but
the core does not control the inputs and effects of the run. Most commonly, artifacts produced in the same build
environment from the same source code will be considered equivalent. But even two identically looking builds from the
same source code could have two different results. The question here is tolerance level - can you accept particular
environment from the same source code will be considered equivalent. But even two identical looking builds from the
same source code can have two different results. The question here is tolerance level can you accept particular
discrepancies? Though technical differences between artifacts like timestamps in manifests are largely ignored, when
compilers used are of different levels, it is likely a critical difference. Should the produced artifacts be considered
equivalents? Yes and No answers are possible and could be desirable in different scenarios. When productivity
equivalent? Yes and No answers are possible and could be desirable in different scenarios. When productivity
and performance are the primary concerns, it could be beneficial to tolerate insignificant discrepancies and maximize
the reuse. As long as correctness is in focus, there could be a demand to comply with the exact release requirements. In
the same way as Maven, the cache correctness is ensured by proper build configuration and control over the build
Expand All @@ -72,11 +70,11 @@ environment. As Maven itself, the cached result is just an approximation of anot
### Implementation insights

Simply put, the build cache is a hash function that takes a Maven project and produces a unique key. Then the key is
used to store and restore build results. Because of different factors, there could be
used to store and restore build results. Because of different factors, there can be
collisions and instabilities in the produced key. A collision happens when the semantically different builds have the
same key and will result in unintended reuse. Instability means that the same input yields different keys resulting in
cache misses. The ultimate target is to find a tradeoff between correctness and performance by configuring cache
processing rules in an xml file.
processing rules in an XML file.

To maximize correctness:

Expand Down Expand Up @@ -107,17 +105,17 @@ still the build owner's responsibility to verify build outcomes.

### Recommended Scenarios

Given all the information above, the build-cache extension is recommended to use in scenarios when productivity and
performance are in priority. Typical cases are:
Given all the information above, the build-cache extension is recommended for use in scenarios when productivity and
performance are high priorities. Typical cases are:

* Continuous integration. In conjunction with the remote cache, the extension could drastically reduce build times,
validate pull requests faster and reduce the load on CI nodes
* Speedup developer builds. By reusing cached builds, developers could verify changes much faster and be more
* Continuous integration. In conjunction with the remote cache, the extension can drastically reduce build times,
validate pull requests faster, and reduce the load on CI nodes.
* Speedup developer builds. By reusing cached builds, developers can verify changes faster and be more
productive.
No more `-DskipTests` and similar.
* Assemble artifacts faster. In some development models, it might be critical to have a build/deploy turnaround as fast
* Assemble artifacts faster. In some development models, it is critical to make the build/deploy turnaround as fast
as
possible. Caching helps to cut down time drastically in such scenarios because it doesn't require building cached
possible. Caching drastically cuts down build time because it doesn't build cached
dependencies.

For cases when users must ensure the correctness (e.g. prod builds), it is recommended to disable the cache and do clean
Expand Down

0 comments on commit d7e9bf1

Please sign in to comment.