Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to containerd if we are unable to fetch SOCI artifacts #1302

Merged
merged 2 commits into from
Jul 19, 2024

Conversation

sondavidb
Copy link
Contributor

Issue #, if available:
Continuation on #1035
Fixes #1034

Description of changes:
Following up on the last comment on #1035, I made the change to not send a FUSE failure for this new error, and added an integration test.

From original PR:

If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers.

Testing performed:
make test && make integration

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sondavidb sondavidb requested a review from a team as a code owner July 3, 2024 21:23
@github-actions github-actions bot added go Pull requests that update Go code testing Unit and/or integration tests labels Jul 3, 2024
If we know that we cannot lazy load an image (eg: SOCI index does not
exist for an image), we should fallback to the underlying runtime to
do the fetching/unpacking of all layers.

Signed-off-by: Yasin Turan <[email protected]>
@sondavidb sondavidb force-pushed the early_defer_no_index branch from 42e26d4 to 1f83ac1 Compare July 3, 2024 21:29
@sondavidb
Copy link
Contributor Author

Huh, interesting. During testing I'm finding out that every other mount will message "deferring to container runtime", instead of just the first or just every mount. Wonder if this could be pointing to an issue.

@sondavidb
Copy link
Contributor Author

Seeing that TestOverlayFallbackMetric (specifically this case) is now failing due to this behavior. The problem is, with this behavior I'm not sure we can capture how many layers are falling back to overlayfs, as only every other layer seems to traverse this. Though, perhaps the answer is that we shouldn't update the metric? But that doesn't seem right.

@sondavidb sondavidb marked this pull request as draft July 8, 2024 17:27
@sondavidb sondavidb force-pushed the early_defer_no_index branch 3 times, most recently from 60a2c67 to ec59091 Compare July 8, 2024 23:06
@sondavidb sondavidb marked this pull request as ready for review July 8, 2024 23:34
@sondavidb
Copy link
Contributor Author

Fixed up the tests to pass. We decided that not sending an OverlayFallback metric made sense due to this being intended behavior, and the metric in question tracks unexpected overlay fallback metrics. This does have the small caveat that an invalid specified SOCI index digest will not emit this metric, but realistically very few people, if any, are actually manually specifying SOCI index digests, so it's probably fine as-is.

For the failing test, it was expecting overlay fallbacks on a bad zTOC, which resulted in the above edge case happening. I refactored the test to corrupt a zTOC in a somewhat hacky way, but it seems to work :P

I previously reported that performance didn't seem to increase, but as I was testing with an image with only one large layer, the lack of parallelization would make a minimal difference in that case. In images with multiple large layers this will heavily increase performance.

There's still one more caveat here, which is that, per the logs, it seems to use double the number of mounts actually required to set up the snapshot. But, upon inspecting the mounts (/var/lib/soci-snapshotter-grpc/snapshotter/snapshots), and checking the content store, it does not store double the number of blobs, so I don't believe this to be an issue, unless I'm looking in the wrong place.

@sondavidb sondavidb force-pushed the early_defer_no_index branch from ec59091 to 2ccf297 Compare July 12, 2024 20:14
@sondavidb sondavidb force-pushed the early_defer_no_index branch from 2ccf297 to 66e2bcb Compare July 15, 2024 22:22
Copy link
Contributor

@Kern-- Kern-- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the whole Prepare method could use a more structured approach to refactoring to make it less complex and to fix all the log noise it produces.

I don't want to block getting this in though, so I'll create an issue for it.

@sondavidb sondavidb merged commit 846397c into awslabs:main Jul 19, 2024
11 checks passed
@sondavidb sondavidb deleted the early_defer_no_index branch July 22, 2024 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go Pull requests that update Go code testing Unit and/or integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Fallback to containerd if we cannot lazy load image
4 participants