-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fallback to containerd if we are unable to fetch SOCI artifacts #1302
Conversation
If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers. Signed-off-by: Yasin Turan <[email protected]>
42e26d4
to
1f83ac1
Compare
Huh, interesting. During testing I'm finding out that every other mount will message "deferring to container runtime", instead of just the first or just every mount. Wonder if this could be pointing to an issue. |
Seeing that TestOverlayFallbackMetric (specifically this case) is now failing due to this behavior. The problem is, with this behavior I'm not sure we can capture how many layers are falling back to overlayfs, as only every other layer seems to traverse this. Though, perhaps the answer is that we shouldn't update the metric? But that doesn't seem right. |
60a2c67
to
ec59091
Compare
Fixed up the tests to pass. We decided that not sending an OverlayFallback metric made sense due to this being intended behavior, and the metric in question tracks unexpected overlay fallback metrics. This does have the small caveat that an invalid specified SOCI index digest will not emit this metric, but realistically very few people, if any, are actually manually specifying SOCI index digests, so it's probably fine as-is. For the failing test, it was expecting overlay fallbacks on a bad zTOC, which resulted in the above edge case happening. I refactored the test to corrupt a zTOC in a somewhat hacky way, but it seems to work :P I previously reported that performance didn't seem to increase, but as I was testing with an image with only one large layer, the lack of parallelization would make a minimal difference in that case. In images with multiple large layers this will heavily increase performance. There's still one more caveat here, which is that, per the logs, it seems to use double the number of mounts actually required to set up the snapshot. But, upon inspecting the mounts ( |
ec59091
to
2ccf297
Compare
Signed-off-by: David Son <[email protected]>
2ccf297
to
66e2bcb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the whole Prepare
method could use a more structured approach to refactoring to make it less complex and to fix all the log noise it produces.
I don't want to block getting this in though, so I'll create an issue for it.
Issue #, if available:
Continuation on #1035
Fixes #1034
Description of changes:
Following up on the last comment on #1035, I made the change to not send a FUSE failure for this new error, and added an integration test.
From original PR:
If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers.
Testing performed:
make test && make integration
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.