Skip to content

Commit

Permalink
WIP: Cargo support
Browse files Browse the repository at this point in the history
First iteration.

Signed-off-by: Alexey Ovchinnikov <[email protected]>
  • Loading branch information
a-ovchinnikov committed Jan 21, 2025
1 parent ce4d758 commit 3c48165
Showing 1 changed file with 79 additions and 52 deletions.
131 changes: 79 additions & 52 deletions docs/designs/cargo-support.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
# Cargo overview
# Adding Cargo support to Cachi2

## Main files
## Background

[Cargo] is the package manager of choice for [Rust] programming language.
It handles building Rust projects as well as retrieving and building their
dependencies. Cargo could be further extended with plugins.

A typical Cargo-managed project has the following structure:

```
├── .cargo
Expand All @@ -11,9 +17,9 @@
└── main.rs (or lib.rs)
```

- Cargo.toml: dependency listing and project configuration.
- Cargo.lock: lockfile that contains the latest resolved dependencies.
- .cargo/config.toml: package manager specific configuration.
Where Cargo.toml contains dependency listing and project configuration,
Cargo.lock is a lockfile that contains the latest resolved dependencies
and .cargo/config.toml: package manager specific configuration.

### Glossary

Expand All @@ -23,8 +29,11 @@ file.

## [Specifying dependencies](https://doc.rust-lang.org/cargo/reference/specifying-dependencies.html)

The examples below show what types of dependencies Cargo supports, and how they can be specified in the
`Cargo.toml` file.
Cargo supports several types of dependencies: on crates distributed through registries, on
github projects and on filesystem paths.

The examples below show the different types of dependencies Cargo supports, and
how they can be specified in the `Cargo.toml` file.

<details>
<summary>default registry (crates.io)</summary>
Expand Down Expand Up @@ -80,10 +89,10 @@ The examples below show what types of dependencies Cargo supports, and how they

<details>
<summary>platform specific</summary>
Note: in cargo docs, "platform" refers interchangeably to both architecture and OS
Cargo has support for specifying dependencies under a certain platform with `#[cfg]`
syntax:

TODO
- note: in cargo docs, "platform" refers interchangeably to both architecture and OS
- cargo has support for specifying dependencies under a certain platform, like
```
[target.'cfg(windows)'.dependencies]
winhttp = "0.4.0"
Expand All @@ -98,10 +107,11 @@ The examples below show what types of dependencies Cargo supports, and how they

[target.i686-unknown-linux-gnu.dependencies]
openssl = "1.0.1"
``
- Regardless, as far as we could tell from experimenting, cargo build requires ALL dependencies to be present - even if they won't be used.
- as a potential optimization, [cargo-vendor-filterer](https://github.com/coreos/cargo-vendor-filterer/) can vendor cargo dependencies
- if we adopt this approach, it might be limited to pure-rust builds
```

Cargo build apparently requires all dependencies to be present - even if they won't be used (
this was determined experimentally and this is in line with other package managers
behavior, see, for example, related section in [Bundler] documentation).
</details>

<details>
Expand All @@ -119,12 +129,6 @@ The examples below show what types of dependencies Cargo supports, and how they
```
</details>

<details>
<summary>platform specific</summary>

TODO
</details>

<details>
<summary>alternative registry</summary>

Expand All @@ -137,10 +141,13 @@ The examples below show what types of dependencies Cargo supports, and how they
```
</details>

All the dependencies types mentioned above are supported by Cargo out of the
box, with either no or minimal additional set up.

### Cargo.lock

The `Cargo.lock` file follows the toml format. Here are some examples of how dependencies are
represented in it.
The `Cargo.lock` file follows the toml format. Below are some examples of how
dependencies are represented in it.

<details>
<summary>main or local package</summary>
Expand Down Expand Up @@ -248,22 +255,31 @@ file or via the `cargo metadata` command.
```
</details>

## Features
## [Features](https://doc.rust-lang.org/cargo/reference/features.html)

*TODO*
Features allow conditional compilation of projects. From cachi2's perspective the
most important aspect of features is optional dependencies. Optional dependency
is such dependency which will not be processed unless explicitly requested.
The safest way to deal with optional dependencies in the context of hermetic builds
would be to use `--all-features` flag with cargo commands when prefetching dependencies.

## [Build Scripts](https://doc.rust-lang.org/cargo/reference/build-scripts.html)

Any package that contains a `build.rs` file in it's root will have it executed during build-time.
Note that this does not happen in any other stage, such as during vendoring or dependency fetching.
Any package that contains a `build.rs` file in it's root will have it executed
during build-time. Note that this does not happen in any other stage, such as
during vendoring or dependency fetching. The build script can contain
arbitrary code, and not running it could result in a failed build, moreover, a
[plugin](https://embarkstudios.github.io/cargo-deny/) is necessary to skip
build scripts.

## [Vendoring](https://doc.rust-lang.org/cargo/commands/cargo-vendor.html)

Cargo offers the option to vendor the dependencies by using `cargo vendor`. All dependencies
(including git dependencies) are downloaded to the `./vendor` folder by default.
Cargo offers the option to vendor the dependencies by using `cargo vendor`. All
dependencies (including git dependencies) are downloaded to the `./vendor`
folder by default.

The command also prints the required configuration that needs to be added to `.cargo/config.toml`
in order for the offline compilation to work. Here's an example:
The command also prints the required configuration that needs to be added to
`.cargo/config.toml` in order for the offline compilation to work:

```toml
[source.crates-io]
Expand All @@ -284,7 +300,7 @@ Also, vendoring does not trigger any builds scripts.

# Cargo support in Cachi2

## Approach 1: use cargo commands
## Approach 1 (preferred): use cargo commands

### Identifying the dependencies

Expand Down Expand Up @@ -453,7 +469,7 @@ This way, we can identify path and git dependencies, as well as the main package
fetched from non-default registries.

Dev and build dependencies have respective `kind`s when listed in the nested `.dependencies` key.
To indentify them and mark them as such in the SBOM, we'd need only to check all the times a single
To identify them and mark them as such in the SBOM, we'd need only to check all the times a single
package appears as a transitive dependency in this output.

### Prefetching
Expand Down Expand Up @@ -481,13 +497,13 @@ Cons:
- Relying on a built-in command brings it's own disadvantages:
- We have less control on what will be executed when invoking `cargo` commands
- We need to account for cargo behavior changes more closely
- We need install cargo in the Cachi2 image and keep its version up to date
- We need to install cargo in the Cachi2 image and keep its version up to date
- Might make it harder to build Pip+Rust projects
- Cargo will refuse to vendor an empty directory with a single `Cargo.toml` file, which
means we'd need to minimally provide a minimal `src/main.rs` file to it.


## Approach 2: manually fetching the dependencies
## Approach 2 (alternative): manually fetching the dependencies

### Identifying the dependencies

Expand Down Expand Up @@ -592,18 +608,26 @@ Cons:
- Checksum files need to be manually generated
- Sub-packages in git dependencies need to moved to a flat structure
- The "vendor" configuration needs to be generated manually
- Extra maintenance burden for Cargo.lock parser

## Decision

# Caveats
Given the rich set of features provided by Cargo for managing dependencies it is more
cost effective to rely on Cargo for performing all the necessary parsing and fetching.
This decision is in line with current approach to other package managers (e.g. Bundler or
Yarn).

## Crates with binaries

## Appendix A. Crates with binaries

Crates are supposed to contain only source code. However, crates.io don't seem to enforce any
rule to prohibit crates being uploaded with binaries. This happened at least once with
[serde][serde-with-binaries], one of the most popular rust libraries.

# Pip + Cargo support in Cachi2
## Appendix B. Pip + Cargo support in Cachi2


## Context
### Context

Traditionally, performance bottlenecks in the python ecosystem are addressed with C extensions,
which introduce their own complexities and safety concerns.
Expand All @@ -620,7 +644,15 @@ Addressing the integration challenges of Rust in Python projects is crucial to e
performance, safety, and concurrency of Python applications. The "rustification" of Python libraries
is here to stay.

## The challenge and cachi2 boundaries
On the other hand Cachi2 in its current shape does not mandate the presence of the sources for all
dependencies. For example both Bundler and Pip will ignore all binary dependencies unless
requested otherwise. Once requested only the binaries themselves will be collected
during the prefetch phase. They will be reported in SBOM as regular packages. Making
fully self-contained builds is a larger topic and is out of scope for this document.
A general description of how this could be achieved for Python packages depending on Rust
is presented in this section.

### The challenge and cachi2 boundaries

Building projects that do DIRECTLY depend on both rust and python should be straightforward and
similar to build with pip and cargo independently. The developers of those projects can easily
Expand All @@ -643,7 +675,7 @@ the package indirectly depends or a file format designed for this. Also [pybuild
might evolve to help solving this problem, so it is not like we would waste any time understanding
these problems.

## Build dependencies
### Build dependencies

`maturin` and `setuptools-rust` are PEP517 compliant build backends for python packages with
embedded rust code.
Expand All @@ -652,7 +684,7 @@ Under the hood, `maturin` relies exclusively on `PyO3` while `setuptools-rust` c
or `Rust-CPython` (but newer projects are likely preferring the former, as the author of
`Rust-CPython` development is halted and its author recommends `PyO3`).

## Detecting python packages with rust dependencies
### Detecting python packages with rust dependencies

We could use the presence of either `maturin` or `setuptools-rust` as build dependencies of a python
package as a heuristic to determine if a package is a python+rust library. Alternatively, we could
Expand All @@ -669,7 +701,7 @@ Packages relying on `maturin` and `setuptools` have a default place to have thei
Cargo.toml/lock stored. Also, parsing the configuration it is possible to know if the path for those
manifests were modified.

### maturin
#### maturin
Detecting `maturin` is easier because it only supports python packages that use `pyproject.toml`
to configure it. So detecting its presence is only a matter of verifying if
`[build-system].requires` contains `maturin`.
Expand All @@ -696,7 +728,7 @@ example:
manifest-path = "Cargo.toml"
```

### setuptools-rust
#### setuptools-rust

Oldest versions of `setuptools-rust` exclusively support `setup.py`, but since version 1.7.0 it also
supports `pyproject.toml`.
Expand Down Expand Up @@ -756,7 +788,7 @@ setup(
)
```

## Vendoring rust dependencies
### Vendoring rust dependencies

Even though `cargo vendor` only requires `Cargo.toml` (and optionally, but ideally for reproducible
builds, `Cargo.lock`), it will fail without source code present. If it wasn't for this, manifest
Expand All @@ -781,7 +813,7 @@ If we go in that direction, we could even go one step further and expect a speci
python+rust dependencies. This allow customers to only need to include a file like
`rust-requirements.txt/toml/json/etc`.

## Hermetically build python + rust libraries
### Hermetically build python + rust libraries

Both `maturin` and `setuptools-rust` will, somehow, invoke cargo during the build process. For this
reason, we can leverage the way cargo is configured to look for vendored packages.
Expand All @@ -790,11 +822,6 @@ In order to do that, we need:
1. A folder with all vendored crates
2. A .cargo/config.toml [[link to the section in the document where this is explained]] overriding
crates.io source with the path to vendored dependencies.
<!--
# TODO: move this to a appropriate section, as this configuration is the same for pure rust
# note, tho, that on pure rust we might need to ADD this info to a existing file instead of writing
# one from scratch.
-->
The config file looks like the following:
```toml
[source.crates-io]
Expand Down Expand Up @@ -825,7 +852,7 @@ RUN source /tmp/cachi2.env && \

```

### Limitations
#### Limitations

- The process likely won't work with python packages lacking Cargo.lock.
- Interestingly, while inspecting some projects relying on maturin I saw many that didn't have a
Expand All @@ -846,4 +873,4 @@ created.
[ccs-inline-github]: https://github.com/Stranger6667/css-inline/tree/wasm-v0.11.2/bindings/python
[serde-with-binaries]: https://www.bleepingcomputer.com/news/security/rust-devs-push-back-as-serde-project-ships-precompiled-binaries/
[pybuild-deps]: https://pybuild-deps.readthedocs.io/en/latest/
[python-rust-research]: https://github.com/bruno-fs/python-rust-research/blob/afebfc7ab6ef55aa0db6879b0cda7760373b60cd/python-rusty-exploration.ipynb
[python-rust-research]: https://github.com/bruno-fs/python-rust-research/blob/afebfc7ab6ef55aa0db6879b0cda7760373b60cd/python-rusty-exploration.ipynb

0 comments on commit 3c48165

Please sign in to comment.