Fallback to containerd if we are unable to fetch SOCI artifacts #1302

sondavidb · 2024-07-03T21:23:45Z

Issue #, if available:
Continuation on #1035
Fixes #1034

Description of changes:
Following up on the last comment on #1035, I made the change to not send a FUSE failure for this new error, and added an integration test.

From original PR:

If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers.

Testing performed:
make test && make integration

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

If we know that we cannot lazy load an image (eg: SOCI index does not exist for an image), we should fallback to the underlying runtime to do the fetching/unpacking of all layers. Signed-off-by: Yasin Turan <[email protected]>

sondavidb · 2024-07-03T21:39:20Z

Huh, interesting. During testing I'm finding out that every other mount will message "deferring to container runtime", instead of just the first or just every mount. Wonder if this could be pointing to an issue.

sondavidb · 2024-07-08T17:26:36Z

Seeing that TestOverlayFallbackMetric (specifically this case) is now failing due to this behavior. The problem is, with this behavior I'm not sure we can capture how many layers are falling back to overlayfs, as only every other layer seems to traverse this. Though, perhaps the answer is that we shouldn't update the metric? But that doesn't seem right.

sondavidb · 2024-07-09T00:11:16Z

Fixed up the tests to pass. We decided that not sending an OverlayFallback metric made sense due to this being intended behavior, and the metric in question tracks unexpected overlay fallback metrics. This does have the small caveat that an invalid specified SOCI index digest will not emit this metric, but realistically very few people, if any, are actually manually specifying SOCI index digests, so it's probably fine as-is.

For the failing test, it was expecting overlay fallbacks on a bad zTOC, which resulted in the above edge case happening. I refactored the test to corrupt a zTOC in a somewhat hacky way, but it seems to work :P

I previously reported that performance didn't seem to increase, but as I was testing with an image with only one large layer, the lack of parallelization would make a minimal difference in that case. In images with multiple large layers this will heavily increase performance.

There's still one more caveat here, which is that, per the logs, it seems to use double the number of mounts actually required to set up the snapshot. But, upon inspecting the mounts (/var/lib/soci-snapshotter-grpc/snapshotter/snapshots), and checking the content store, it does not store double the number of blobs, so I don't believe this to be an issue, unless I'm looking in the wrong place.

integration/metrics_test.go

snapshot/snapshot.go

fs/fs.go

snapshot/snapshot.go

Signed-off-by: David Son <[email protected]>

Kern--

I think the whole Prepare method could use a more structured approach to refactoring to make it less complex and to fix all the log noise it produces.

I don't want to block getting this in though, so I'll create an issue for it.

sondavidb requested a review from a team as a code owner July 3, 2024 21:23

github-actions bot added go Pull requests that update Go code testing Unit and/or integration tests labels Jul 3, 2024

sondavidb force-pushed the early_defer_no_index branch from 42e26d4 to 1f83ac1 Compare July 3, 2024 21:29

sondavidb marked this pull request as draft July 8, 2024 17:27

sondavidb force-pushed the early_defer_no_index branch 3 times, most recently from 60a2c67 to ec59091 Compare July 8, 2024 23:06

sondavidb marked this pull request as ready for review July 8, 2024 23:34

Kern-- reviewed Jul 11, 2024

View reviewed changes

integration/metrics_test.go Outdated Show resolved Hide resolved

snapshot/snapshot.go Outdated Show resolved Hide resolved

fs/fs.go Outdated Show resolved Hide resolved

sondavidb force-pushed the early_defer_no_index branch from ec59091 to 2ccf297 Compare July 12, 2024 20:14

austinvazquez reviewed Jul 15, 2024

View reviewed changes

snapshot/snapshot.go Outdated Show resolved Hide resolved

snapshot/snapshot.go Outdated Show resolved Hide resolved

snapshot/snapshot.go Outdated Show resolved Hide resolved

snapshot/snapshot.go Outdated Show resolved Hide resolved

Add integration test for no image index

66e2bcb

Signed-off-by: David Son <[email protected]>

sondavidb force-pushed the early_defer_no_index branch from 2ccf297 to 66e2bcb Compare July 15, 2024 22:22

Kern-- approved these changes Jul 15, 2024

View reviewed changes

austinvazquez approved these changes Jul 19, 2024

View reviewed changes

sondavidb merged commit 846397c into awslabs:main Jul 19, 2024
11 checks passed

Kern-- mentioned this pull request Jul 22, 2024

Fallback to containerd if we are unable to fetch SOCI artifacts #1035

Closed

sondavidb deleted the early_defer_no_index branch July 22, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fallback to containerd if we are unable to fetch SOCI artifacts #1302

Fallback to containerd if we are unable to fetch SOCI artifacts #1302

sondavidb commented Jul 3, 2024

sondavidb commented Jul 3, 2024

sondavidb commented Jul 8, 2024

sondavidb commented Jul 9, 2024

Kern-- left a comment

Fallback to containerd if we are unable to fetch SOCI artifacts #1302

Fallback to containerd if we are unable to fetch SOCI artifacts #1302

Conversation

sondavidb commented Jul 3, 2024

sondavidb commented Jul 3, 2024

sondavidb commented Jul 8, 2024

sondavidb commented Jul 9, 2024

Kern-- left a comment

Choose a reason for hiding this comment