-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up after unexpectedly terminated build #25102
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Honny1 The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
596c6bb
to
c607887
Compare
libpod/runtime.go
Outdated
if err != nil { | ||
return stageContainersPruneReports, err | ||
} | ||
if _, err := os.Stat(filepath.Join(path, "buildah.json")); errors.Is(err, fs.ErrNotExist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use fileutils.Exists()
@nalind Is there another (better?) way to check if a storage container is from buildah?
libpod/runtime.go
Outdated
size, err := r.store.ContainerSize(container.ID) | ||
if err != nil { | ||
report.Err = err | ||
logrus.Warnf("Failed to get size of build stage container %s: %v", container.ID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not report the error to the caller and also print a warning.
Just reporting the errors back to the caller is good enough. The caller can then log once.
libpod/runtime.go
Outdated
report.Err = err | ||
logrus.Warnf("Failed to remove build stage container %s: %v", container.ID, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pkg/domain/infra/abi/system.go
Outdated
|
||
reclaimedSpace := (uint64)(0) | ||
found := true | ||
for found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a loop? Running once should be enough.
pkg/domain/infra/abi/system.go
Outdated
stageContainersPruneReports, err := ic.Libpod.PruneStageContainers() | ||
if err != nil { | ||
return nil, err | ||
} | ||
if len(stageContainersPruneReports) > 0 { | ||
found = true | ||
} | ||
reclaimedSpace += reports.PruneReportsSize(stageContainersPruneReports) | ||
systemPruneReport.ContainerPruneReports = append(systemPruneReport.ContainerPruneReports, stageContainersPruneReports...) | ||
|
||
// Prune Images | ||
imagePruneOptions := entities.ImagePruneOptions{ | ||
External: true, | ||
BuildCache: true, | ||
} | ||
imageEngine := ImageEngine{Libpod: ic.Libpod} | ||
imagePruneReports, err := imageEngine.Prune(ctx, imagePruneOptions) | ||
if err != nil { | ||
return nil, err | ||
} | ||
if len(imagePruneReports) > 0 { | ||
found = true | ||
} | ||
reclaimedSpace += reports.PruneReportsSize(imagePruneReports) | ||
systemPruneReport.ImagePruneReports = append(systemPruneReport.ImagePruneReports, imagePruneReports...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems inconsistent with the documentation, from the docs you wrote it sounds like it only prune the build containers but this prunes everything other containers and images as well.
I don't think this is desirable.
If we want that behavior then the code should not cause a conflict with --build option and only do this in addition to the existing code in SystemPrune() instead of duplicating the image logic here.
test/e2e/prune_test.go
Outdated
hasNone, result := none.GrepString("<none>") | ||
Expect(result).To(HaveLen(1)) | ||
Expect(hasNone).To(BeTrue()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This leads to horrible Expected false to be true errors.
Please use the proper matchers, something like Expect(none.OutputToString()).To(ContanSubstring("none"))
test/e2e/prune_test.go
Outdated
dirents, err := os.ReadDir(containerStorageDir) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(dirents).To(HaveLen(6)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this matter? This peaks at rather internal storage details which can change requiring the test to be updated often. Would it not be better to run buildah containers
instead? Then in the end ensure it is removed there?
test/e2e/prune_test.go
Outdated
after := podmanTest.Podman([]string{"images", "-a"}) | ||
after.WaitWithDefaultTimeout() | ||
Expect(after).Should(ExitCleanly()) | ||
Expect(len(after.OutputToStringArray())).To(BeNumerically(">", 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this should check, AFAIK in the test/e2e setup we will always have extra images from the additional store shown.
test/e2e/prune_test.go
Outdated
hasNoneAfter, result := after.GrepString("<none>") | ||
Expect(result).To(BeEmpty()) | ||
Expect(hasNoneAfter).To(BeFalse()) | ||
hasNotLeakerImager, _ := after.GrepString("notleaker") | ||
Expect(hasNotLeakerImager).To(BeTrue()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also needs to use proper matchers
test/e2e/prune_test.go
Outdated
// still have: volatile-containers.json, containers.json, containers.lock and container dir | ||
dirents, err = os.ReadDir(containerStorageDir) | ||
Expect(err).ToNot(HaveOccurred()) | ||
Expect(dirents).To(HaveLen(4)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment I would use buildah containers
to check
Also what this doesn't answer what happens if the build process is still running? And another thing that this does not cleanup is the networking, not sure if there is a good way for that but if you kill the build and run with bridge networking the interfaces/firewall rules and ipam db allocations are still leaked. They should be cleaned after reboot so not that bad like the storage leak and certainly not a blocker for this here. I just mention it as there are other leaks to consider too. |
@Luap99 I have incorporated your review. I've changed the approach and the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, code wise this seems good to me. just a minor comment on the man page and we need to make sure the test does not run forever
|
||
Removes any build containers that were created during the build, but were not removed because the build was unexpectedly terminated. | ||
|
||
> **This is not safe operation and should be executed only when no builds are in progress. It can interfere with builds in progress.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the general style in the man pages was to write Note: ...
or something like that. I don't think we have used >
elsewhere. I don't mind it for the web view but I have not yet looked at how it looks in the rendered man page.
test/e2e/prune_test.go
Outdated
if build.LineInOutputContains("Please use signal 9") { | ||
break | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great but maybe add a small sleep for like 10ms or something to not busy poll which also eats a lot of resources.
Second this needs a timeout, if the build process fails and never prints the output this would loop forever causing hard to debug issues in CI.
The `podman system prune` command is able to remove build containers that were created during the build, but were not removed because the build terminated unexpectedly. By default, build containers are not removed to prevent interference with builds in progress. Use the **--build** flag when running the command to remove build containers as well. Fixes: https://issues.redhat.com/browse/RHEL-62009 Signed-off-by: Jan Rodák <[email protected]>
The
podman system prune
command can remove build containers that were created during the build but were not removed because the build terminated unexpectedly.By default, build containers are not removed to prevent interference with builds in progress. Use the --build flag when running the command to remove build containers as well.
Reproducer:
run.sh
:podman unshare du -sh ~/.local/share/containers/
podman unshare du -sh ~/.local/share/containers/
podman unshare du -sh ~/.local/share/containers/
Fixes: #14523
Fixes: #23683
Fixes: https://issues.redhat.com/browse/RHEL-62009
Does this PR introduce a user-facing change?