Add graph-based pod stop #25169

mheon · 2025-01-30T18:19:39Z

Implement a graph-based pod stop and use it by default, ensuring that containers stop in a dependency-based order. This prevents race conditions where application containers stopped after the infra container, meaning they did not have functional networking for the last seconds before they stopped, potentially causing unexpected application errors.

As a pleasant side-effect, make removing containers within a pod parallel, which should improve performance.

Full details in commit descriptions.

Does this PR introduce a user-facing change?

Containers in pods are now stopped in order based on their dependencies, with the infra container being stopped last, preventing application containers from losing networking before they are stopped due to the infra container stopping prematurely.

mheon · 2025-01-30T18:22:54Z

Tagging no new tests as the existing pod stop/remove tests should exercise this. Would be nice to test the ordering aspect but I'm not sure if that's doable without being very racy.

Luap99 · 2025-01-30T18:54:45Z

For tests one thing that we could do is check the FinishedAt time from inspect for the infra and actual container after pod stop, then make sure the one of the infra is later.
That should guarantee us it was stopped last and it should be easy to add to an existing pod stop test.

Luap99

will do a proper review tomorrow

Luap99 · 2025-01-30T18:56:48Z

libpod/container_api.go

+		exists, err := c.runtime.state.HasContainer(c.ID())
+		if err != nil {
+			return err
+		}
+		if !exists {
+			return fmt.Errorf("container %s does not exist in database: %w", c.ID(), define.ErrNoSuchCtr)
+		}


That adds a extra db queries overhead, I am not sure we need this at all.
Technically if the pod doesn't exists you could just ignore the error, i.e. only take locks in an if err == nil {} block.
Then the other code would return its normal error anyway.

Oh, I like that. Will fix.

Luap99 · 2025-01-30T18:57:29Z

libpod/container_api.go

+	// Have to lock the pod the container is a part of.
+	// This prevents running `podman start` at the same time a
+	// `podman pod stop` is running, which could lead to wierd races.
+	// Pod locks come before container locks, so do this first.
+	if c.config.Pod != "" {


We should do the same thing around stop for consistency

mheon · 2025-01-31T13:24:48Z

Oops. Forgot to wait for the parallel executors to stop creating horrible races. Fixed now.

packit-as-a-service · 2025-01-31T21:29:03Z

Ephemeral COPR build failed. @containers/packit-build please check.

packit-as-a-service · 2025-01-31T21:56:15Z

Cockpit tests failed for commit 3aa18df. @martinpitt, @jelly, @mvollmer please check.

martinpitt · 2025-02-03T08:58:30Z

The cockpit test failure from above is essentially this:

# podman pod rm --force --time 0 --all

Error: not all containers could be removed from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: removing pod containers
Error: error removing container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c has dependent containers which must be removed before it: df56d19ae6e61a5dc779e1e6e1994734e2e490ed0e8769efa2fb48a29e13ce6f: container already exists

This feels related to this change? The latest push passed, but in commit 3aa18df it also passed in F41 and only failed once in Rawhide -- so this feels like a race condition, or possibly file system order etc., i.e. something not reliably reproducible? Does that ring a bell?

Luap99 · 2025-02-03T09:37:22Z

@martinpitt Most of our tests fail as well so yes this patch is broken and cannot be merged like that.

packit-as-a-service · 2025-02-03T16:01:18Z

Ephemeral COPR build failed. @containers/packit-build please check.

The intention behind this is to stop races between `pod stop|start` and `container stop|start` being run at the same time. This could result in containers with no working network (they join the still-running infra container's netns, which is then torn down as the infra container is stopped, leaving the container in an otherwise unused, nonfunctional, orphan netns. Locking the pod (if present) in the public container start and stop APIs should be sufficient to stop this. Signed-off-by: Matt Heon <[email protected]>

mheon · 2025-02-03T17:26:51Z

@Luap99 This is ready for review now

libpod/container_graph.go

baude · 2025-02-03T22:34:57Z

LGTM

Luap99

A few comments, I have a hard time following the graph logic.

For start I understand that it makes sense to not continue starting when the dep failed. But for stop I am not sure. Even if we fail to stop one I feel like we should still stop all the other deps regardless.

The other thing is the locking in traverseNodeInwards(), the amount of special cases an locking /unlocking of node.lock and nodeDetails.lock makes me uneasy but I cannot offer anything better and test pass so I guess I just have to live with it.

Luap99 · 2025-02-04T12:21:57Z

test/e2e/pod_stop_test.go

+		infraStopTime, err := time.Parse(timeFormat, infraStop.OutputToString())
+		Expect(err).ShouldNot(HaveOccurred())
+
+		Expect(ctrStopTime.Before(infraStopTime)).To(BeTrue())


please use Expect(infraStopTime).To(BeTemporally(">", ctrStopTime)) to provide useful errors when this fails.
https://onsi.github.io/gomega/#betemporallycomparator-string-compareto-timetime-threshold-timeduration

with you code you just get the "expect false to be true error"

Luap99 · 2025-02-04T12:23:13Z

test/e2e/pod_stop_test.go

+		podInspect := podmanTest.PodmanExitCleanly("pod", "inspect", "--format", "{{ .InfraContainerID }}", podName)
+		infraCtrID := podInspect.OutputToString()


you can skip the inspect call when you give a name on create instead with --infra-name

Luap99 · 2025-02-04T12:28:35Z

libpod/pod_api.go

 		}
 	}

-	if len(ctrErrors) > 0 {
+	if len(ctrErrors) > 1 {


Why this change, it seems like we skip reporting errors when one ctr failed?

Luap99 · 2025-02-04T12:32:32Z

libpod/container_graph.go

+		ctr.lock.Unlock()
+
+		if cleanup {
+			return ctr.Cleanup(ctx, false)
+		}


this seems fragile locking, i.e. if new early returns are added without unlocking.
And it unnecessarily unlocks to just to immediately lock again. Would it not be better to have an internal version of Cleanup() that does not take the lock.

First, refactor our existing graph traversal code to improve code sharing. There still isn't much sharing between inward traversal (stop, remove) and outward traversal (start) but stop and remove are sharing most of their code, which seems a positive. Second, add a new graph-traversal function to stop containers. We already had start and remove; stop uses the newly-refactored inward-traversal code which it shares with removal. Third, rework the shared stop/removal inward-traversal code to add locking. This allows parallel execution of stop and removal, which should improve the performance of `podman pod rm` and retain the performance of `podman pod stop` at about what it is right now. Fourth and finally, use the new graph-based stop when possible to solve unordered stop problems with pods - specifically, the infra container stopping before application containers, leaving those containers without a working network. Fixes https://issues.redhat.com/browse/RHEL-76827 Signed-off-by: Matt Heon <[email protected]>

Luap99

LGTM, your PR was not rebased on the last push which means I See flakes that have already been fixed on main. Please remember to rebase.

openshift-ci · 2025-02-07T12:55:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Luap99,mheon]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

mheon · 2025-02-07T13:20:51Z

@containers/podman-maintainers PTAL and merge

openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 30, 2025

mheon added the No New Tests Allow PR to proceed without adding regression tests label Jan 30, 2025

Luap99 reviewed Jan 30, 2025

View reviewed changes

mheon force-pushed the graph_stop branch 2 times, most recently from 9c9d10f to bee225e Compare January 31, 2025 13:24

mheon force-pushed the graph_stop branch 11 times, most recently from ed5adc4 to 3aa18df Compare January 31, 2025 21:27

mheon force-pushed the graph_stop branch 3 times, most recently from 2fb56ab to 8f21503 Compare January 31, 2025 23:32

mheon force-pushed the graph_stop branch from 8f21503 to e7f79d1 Compare February 3, 2025 15:56

mheon force-pushed the graph_stop branch 2 times, most recently from a080ae2 to 041c6a3 Compare February 3, 2025 16:51

baude reviewed Feb 3, 2025

View reviewed changes

libpod/container_graph.go Show resolved Hide resolved

Luap99 reviewed Feb 4, 2025

View reviewed changes

Luap99 removed the No New Tests Allow PR to proceed without adding regression tests label Feb 4, 2025

mheon force-pushed the graph_stop branch from 041c6a3 to 46d874a Compare February 6, 2025 23:28

Luap99 approved these changes Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add graph-based pod stop #25169

Add graph-based pod stop #25169

mheon commented Jan 30, 2025

mheon commented Jan 30, 2025

Luap99 commented Jan 30, 2025

Luap99 left a comment

Luap99 Jan 30, 2025

mheon Jan 30, 2025

Luap99 Jan 30, 2025

mheon commented Jan 31, 2025

packit-as-a-service bot commented Jan 31, 2025

packit-as-a-service bot commented Jan 31, 2025

martinpitt commented Feb 3, 2025

Luap99 commented Feb 3, 2025

packit-as-a-service bot commented Feb 3, 2025

mheon commented Feb 3, 2025

baude commented Feb 3, 2025

Luap99 left a comment

Luap99 Feb 4, 2025

Luap99 Feb 4, 2025

Luap99 Feb 4, 2025

Luap99 Feb 4, 2025

Luap99 left a comment

openshift-ci bot commented Feb 7, 2025

mheon commented Feb 7, 2025

		podInspect := podmanTest.PodmanExitCleanly("pod", "inspect", "--format", "{{ .InfraContainerID }}", podName)
		infraCtrID := podInspect.OutputToString()

Add graph-based pod stop #25169

Are you sure you want to change the base?

Add graph-based pod stop #25169

Conversation

mheon commented Jan 30, 2025

Does this PR introduce a user-facing change?

mheon commented Jan 30, 2025

Luap99 commented Jan 30, 2025

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Jan 30, 2025

Choose a reason for hiding this comment

mheon Jan 30, 2025

Choose a reason for hiding this comment

Luap99 Jan 30, 2025

Choose a reason for hiding this comment

mheon commented Jan 31, 2025

packit-as-a-service bot commented Jan 31, 2025

packit-as-a-service bot commented Jan 31, 2025

martinpitt commented Feb 3, 2025

Luap99 commented Feb 3, 2025

packit-as-a-service bot commented Feb 3, 2025

mheon commented Feb 3, 2025

baude commented Feb 3, 2025

Luap99 left a comment

Choose a reason for hiding this comment

Luap99 Feb 4, 2025

Choose a reason for hiding this comment

Luap99 Feb 4, 2025

Choose a reason for hiding this comment

Luap99 Feb 4, 2025

Choose a reason for hiding this comment

Luap99 Feb 4, 2025

Choose a reason for hiding this comment

Luap99 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Feb 7, 2025

mheon commented Feb 7, 2025