-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to containerd if we are unable to fetch SOCI artifacts #1035
Conversation
If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers. Signed-off-by: Yasin Turan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, from what I understand since we currently just use all local mounts in this use case, it makes sense to defer to container runtime.
Definitely not in the scope of this PR, but wonder if we'd ever want behavior where we would want a toggle to attempt to auto-generate a SOCI index if one isn't found. We need a SOCI index to do any work so it might be something nice to have. Just a thought for now though.
This is probably useful even when an image has been already downloaded before soci-snapshotter was enabled, right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the changes LGTM. A few questions I had:
- Are we able to mock the ErrUnableToLazyLoad returned from the filesystem mount in snapshotter unit tests to validate a fallback occurs?
- Wondering if we should emit a metric on this case where SOCI artifacts exist but we failed to pull them. My thought is it might not be worth tracking because we have the log. Curious if you had some thoughts on it.
- Just want to validate an integration test here is difficult because we'd need a method for pulling an image with SOCI artifacts but an error occurs while pulling. e.g. network error.
We do a similar thing in TestNetworkRetry, where we pull an image from a repo, run the image with SOCI, attempt to do run a command, then block network access to the registry and try again. TestNetworkRetry specifically is not implemented correctly, though, and I'm not sure how we would simulate this, but the idea is there. Does this need to be an integration test though? Is there any way we can make it a unit test, e.g. by calling Mount with a bunch of local mounts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Prepare
is in need of some rethinking/refactoring to make it more clear what the logic actually is and how that logic appears in logs.
For example, we now have both skipLazyLoadingImage
and o.skipRemoteSnapshotPrepare
. If we skip lazy loading an image, we will log "failed to prepare remote snapshot" and "failed to prepare snapshot; deferring to container runtime" for every layer which doesn't really convey what's actually going on.
Should we also fall back if a bad SOCI index digest is passed in? Right now, pulling while specifying a bad index digest ends up having a similar issue, so unless we want to switch between SOCI indexes for different layers of the same image (which I really don't think we do), it might be a nice thing to add. |
@dims Sorry for the delay. All this change will do is ensure that we defer back to containerd earlier when we know we can't lazy load the image. If the image has already been downloaded and the blob content exists in the containerd content store, this will ensure that SOCI won't blindly attempt to sequentially fetch each layer again. |
We defer back to containerd anytime we can't fetch SOCI artifacts, whether it's because they don't exist (a bad digest) or if there were network issues. |
TODO:
|
Fixed as part of #1302 |
If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers.
Issue #, if available:
Fixes: #1034
Description of changes:
Testing performed:
make test && make integration
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.