-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rpm-ostree compose build-chunked-oci
#5221
Comments
A lot of technical debt here. A long time ago I added this hacky bit to inject var/tmp is the container stream even if it wasn't in the ostree commit. Today things shipped by `rpm-ostree compose image` like FCOS don't have `var/tmp` in the commit. But then more recently we started shipping `/var/tmp` in base images directly. Now I'm working on coreos/rpm-ostree#5221 where we're rechunking from a rootfs that does have var/tmp and that ends up in the ostree commit. The path comparison here was wrong because the tar stream we generate has the paths start with `./` and a literal comparison doesn't match `./var/tmp` != `var/tmp`. Add a canonicalization helper and use it for this. Signed-off-by: Colin Walters <[email protected]>
A lot of technical debt here. A long time ago I added this hacky bit to inject var/tmp is the container stream even if it wasn't in the ostree commit. Today things shipped by `rpm-ostree compose image` like FCOS don't have `var/tmp` in the commit. But then more recently we started shipping `/var/tmp` in base images directly. Now I'm working on coreos/rpm-ostree#5221 where we're rechunking from a rootfs that does have var/tmp and that ends up in the ostree commit. The path comparison here was wrong because the tar stream we generate has the paths start with `./` and a literal comparison doesn't match `./var/tmp` != `var/tmp`. Add a canonicalization helper and use it for this.
Closes: coreos#5221 Signed-off-by: Colin Walters <[email protected]>
Would it be possible to build a variant in ostree-rs-ext directly and allow using ostree-ext-cli instead as an option? It would remove the fedora dependency for one, and for second, the chunking algorithm in rpm-ostree leaves something to be desired. Also, you need to make sure you allow the user to force hardlinks or bail for creating the ostree commit in the image for space considerations (see https://gitlab.com/fedora/bootc/tracker/-/issues/32#note_2297736215). This way, the only additional copy created is the .oci artifact Then, if the user wants to avoid the third and fourth copies, they can load the resulting oci artifact and do the cleanup outside the container. |
I am also looking at simplifying the rechunk action and this would be a great opportunity to do it. It would solve the permissions issues for one. Also note that this step is VERY performance sensitive as it runs after every build and can take as much as the build it For example, all the following lines do is add 5-6m to the build. Also, when you load an oci image produced by ostree-rs-ext into podman, it grows by around half a gig in our usecase due to inferior compression
Much cheaper to use skopeo to upload instead |
Hi @antheas thanks for your comments! rechunk is very cool. However a structural issue is that I need to ship something that "my" team maintains, lifecycled/versioned with the rest of our tools. Adding a new container image would be a nontrivial lift at this time, and the core part I care about is already mostly running ostree/rpm-ostree code anyways. We could ship the code in an existing container, but... A lot more on related bits below:
Yes, you're right it is certain that for larger images (usually desktops) we'll want something better and more configurable as you've implemented in rechunk. My PoV here is to allow users to not regress from the tooling we use to build fedora-bootc derivatives today to start. Bigger picture future
On the bootc side, we already support ingesting images without ostree-in-container at all. The tooling here will default to generating that in order to maintain backwards compatibility. But, as we think towards next steps, I think where we want to go is maintaining a "generic" rechunker that could also apply to docker/podman app images (or really even flatpak-OCI). I don't think it'd be really hard to take the code we have and slightly generalize it, dropping most of the ostree stuff (although we will want to support e.g. selinux labeling and other things at build time). I'd love to collaborate on that in say a repo in github.com/containers - although we'd need to work through some details. Back to details: WorkflowToday rechunk is defined to run as a Github action primarily. While clean support of GHA is important, for what I ship it must be straightforward to use in as many container build environments as possible. This is why the design (in https://gitlab.com/fedora/bootc/tracker/-/issues/32#note_2295033343 ) to start uses a buildah/podman feature where we can have the build just be a Containerfile - so anything that knows how to consume that could operate on it. Now of course this is a podman-ecosystem specific feature (though nothing would stop us from arguing to add it to docker/moby). In the end of course with what we have in this code you aren't at all required to use that Containerfile flow - as you say one can also write the ociarchive and then upload it from there instead of injecting into containers-storage. I'll look to document how to do that flow as well. Size issuesToday we end up with 4 uncompressed copies at the end of the containerfile flow (intermediate image, ostree repo, oci archive, containers-storage). I think with the ostree flow I can teach things so that we don't create a temporary ostree-repo, we just compute the checksums in memory. That'd drop out one whole copy. We could probably teach podman to have something like:
I think the only way to then fix the "intermediate image" copy would be some explicit way to flush that out like
(Making up a |
Of course. I never said rechunk is enterprise ready. You should re-use your IP with rpm-ostree to make something that works today. I was just noting some roadblocks I faced and you will inevitably face as well. ...in the process perhaps nudging some of those PRs towards ostree-rs-ext so I can also use them :) The major two roadblocks we have faced are storage and speed. And I don't think those will be unique to GHA. Bazzite builds in 10m, rechunk takes 7m to run and 3m to upload. And I think those scale linearly with each other. So with rechunk we are at 2x. I think the bandwidth savings from rechunking make it worth it just from a deployment time perspective at 2x. However, if you're not careful and instead of 2x you start talking about 3x or 4x... It gets a bit harder to justify. Same with all your builders suddenly requiring 4x the storage. There is also a third issue. Since you are not using rpm-ostree to unwrap the image as an ostree commit, you open a can of worms that has been haunting me. Its the Hopefully, your ostree-rs-ext PRs 1-2 months ago helped in that and now OCI serialized commits have correct permissions. In rechunk I really want to do a few things next time I clean it up in a few months. I want to keep or lower the 2x overhead while fixing the following. I want to hopefully remove the need for the |
@vrothberg and I were chatting and maybe one thing that could help here would be for us to ship the last half of the reference code as
or whatever, probably in practice they may not have |
I thought about it some more. If the fix of ostree-rs-ext dropping special attrs, permissions, etc is deployed and we teach ostree to complete the steps that are leftover in Then if its also a memory model, we can drop the extra copy and maybe it will be faster. As for the final copy re: If all those are in place, I also think I will go this route and drop/simplify the github action Sidenote: The rechunk algorithm is not something that would take more than a couple of days to rewrite in rust but I would still prefer you leave some leeway or avenue for prototyping/using something better. |
Closes: coreos#5221 Signed-off-by: Colin Walters <[email protected]>
Thank you. Indeed skimming that I see this approach is affected by several things. Taking just one of them in the Yes, the problem I think is that we're not synthesizing those xattrs into the tar stream even for the existing base image. I will look at doing this, though it opens up a bit more of a can of worms around |
Wow, yes tricky. There's clearly a lot to dig through there and unwind. I think a target should probably be to have a CI test which compares a base image built by |
|
This is interesting. It's quite possible that the gzip level (or actually vendored version of gzip) differs from what we happen to use in ocidir. Although, a big picture issue there is that because gzip is pretty stable, we haven't changed much over time for layers. But the moment we rev the compression it can blow out the cache entirely. cc docker-library/official-images#17720 (comment) |
The rationale behind is as follows:
|
Yeah this is interesting; comparing the container filesystem vs the rpmdb, but ignoring missing files and pure timestamp changes:
The things that are just That said, there is a workaround we can do here which is to pull that data from the source ostree commit (or from the rpmdb). But it shouldn't be too hard to get our base images rebuilt. The next thing I want to dig into here is the ones that are |
It's rpm-ostree/rust/src/importer.rs Line 313 in 2f470db
|
Is the main thing we're trying to avoid with this to have to filter out mountpoints? The UX does suffer a bit as a result. We definitely should support the non-
Yes, definitely. It feels like that work could happen in buildah? (And exposed via podman as well). (And in line with the above given that podman currently is guaranteed to be in your package set.) |
That's part of it, but it's not hard to avoid crossing mountpoints. But the real thing we want to support is shipping images that drop out rpm-ostree (or in general, tools that we may use at build time to generate the OCI). The demonstration flow I have there shows that working. It'd likely be possible of course to require rpm-ostree (or whatever build tool) to have no external dependencies other than the most minimal base image (probably say just glibc) and then one could But yes...it probably wouldn't be too hard to make |
Internal only change, but prep for future work; I want to track the version lints were introduced in, support clearly differentiating between warnings and fatal errors, etc. Specifically motivated by custom base image work coreos/rpm-ostree#5221 but also obviously useful in general. Signed-off-by: Colin Walters <[email protected]>
Internal only change, but prep for future work; I want to track the version lints were introduced in, support clearly differentiating between warnings and fatal errors, etc. Specifically motivated by custom base image work coreos/rpm-ostree#5221 but also obviously useful in general. Signed-off-by: Colin Walters <[email protected]>
Internal only change, but prep for future work; I want to track the version lints were introduced in, support clearly differentiating between warnings and fatal errors, etc. Specifically motivated by custom base image work coreos/rpm-ostree#5221 but also obviously useful in general. Signed-off-by: Colin Walters <[email protected]>
Closes: coreos#5221 Signed-off-by: Colin Walters <[email protected]>
Could I ask where you vision this being used? Or would it also be useful for custom images?
|
No.
Yes. It's for creating derived images that "feel like" base images. A lot more in Although as the docs say, nothing requires you technically to run through a container derivation flow to create the rootfs - so you certainly can use this to make a base image how you want to, while still picking up some of the logic we have. Also actually with bootc especially after containers/bootc#887, nothing requires you to use this tool at all!
Yes, these are two of the issues that might drive one to use this approach. |
Been trying to play around with this recently, though I keep facing errors where the build-chunked-oci command throws an error regarding unsupported regular files being present.
Perhaps there's some validation steps which can be done beforehand (like as part of bootc container lint or the build-chunked-oci command) which alerts the user of the full path for these unsupported files. At the moment, the error isn't the most useful since I would have expected an |
xref https://gitlab.com/fedora/bootc/tracker/-/issues/32#note_2295033343
Basically
We have the two core pieces of this in
ostree commit
+rpm-ostree compose container-encapsulate
. However - doing this from the mounted rootfs gets ugly as we obviously don't want to traverse into mount points like/proc
- ostree never learned to skip mount points since we always told people to do the separate rootfs.What's worse is things like
/tmp
(and/var/tmp
) won't be mount points - but we definitely don't want to traverse into those either.Hmm. A core thing driving us to the "nested container" was precisely so we could operate on a separate rootfs. But it'd clearly be really useful if we didn't need nested container perms.
I think these heuristics would be OK to start:
/boot
is nonempty (xref bootc container lints)/var/tmp
? Obviously we could just try to avoid using/var/tmp
ourselves but that seems...messy to commit to.Hmm maybe it'd add some flexibility if we supported something like
setfattr -n user.rpmostree.skip -v 1
so that things could be omitted without actually being deleted at that time. Then in fact one could add that xattr to even/usr/bin/rpm-ostree
to avoid shipping it. And while we're here, probably honor CACHEDIR.TAG.EDIT: Actually of course it's possible to use
COPY --from=rootfs / /rootfs
or even betterto access the cleanly split out root from the mid stage image. So let's just aim for that.
Dependencies:
[ ] containers/bootc#1032
[ ] #5225
The text was updated successfully, but these errors were encountered: