Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add graph-based pod stop #25169

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Add graph-based pod stop #25169

wants to merge 2 commits into from

Conversation

mheon
Copy link
Member

@mheon mheon commented Jan 30, 2025

Implement a graph-based pod stop and use it by default, ensuring that containers stop in a dependency-based order. This prevents race conditions where application containers stopped after the infra container, meaning they did not have functional networking for the last seconds before they stopped, potentially causing unexpected application errors.

As a pleasant side-effect, make removing containers within a pod parallel, which should improve performance.

Full details in commit descriptions.

Does this PR introduce a user-facing change?

Containers in pods are now stopped in order based on their dependencies, with the infra container being stopped last, preventing application containers from losing networking before they are stopped due to the infra container stopping prematurely.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 30, 2025
@mheon mheon added the No New Tests Allow PR to proceed without adding regression tests label Jan 30, 2025
@mheon
Copy link
Member Author

mheon commented Jan 30, 2025

Tagging no new tests as the existing pod stop/remove tests should exercise this. Would be nice to test the ordering aspect but I'm not sure if that's doable without being very racy.

@Luap99
Copy link
Member

Luap99 commented Jan 30, 2025

For tests one thing that we could do is check the FinishedAt time from inspect for the infra and actual container after pod stop, then make sure the one of the infra is later.
That should guarantee us it was stopped last and it should be easy to add to an existing pod stop test.

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do a proper review tomorrow

Comment on lines 98 to 101
exists, err := c.runtime.state.HasContainer(c.ID())
if err != nil {
return err
}
if !exists {
return fmt.Errorf("container %s does not exist in database: %w", c.ID(), define.ErrNoSuchCtr)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That adds a extra db queries overhead, I am not sure we need this at all.
Technically if the pod doesn't exists you could just ignore the error, i.e. only take locks in an if err == nil {} block.
Then the other code would return its normal error anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I like that. Will fix.

Comment on lines 89 to 93
// Have to lock the pod the container is a part of.
// This prevents running `podman start` at the same time a
// `podman pod stop` is running, which could lead to wierd races.
// Pod locks come before container locks, so do this first.
if c.config.Pod != "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same thing around stop for consistency

@mheon mheon force-pushed the graph_stop branch 2 times, most recently from 9c9d10f to bee225e Compare January 31, 2025 13:24
@mheon
Copy link
Member Author

mheon commented Jan 31, 2025

Oops. Forgot to wait for the parallel executors to stop creating horrible races. Fixed now.

@mheon mheon force-pushed the graph_stop branch 11 times, most recently from ed5adc4 to 3aa18df Compare January 31, 2025 21:27
Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

Copy link

Cockpit tests failed for commit 3aa18df. @martinpitt, @jelly, @mvollmer please check.

@mheon mheon force-pushed the graph_stop branch 3 times, most recently from 2fb56ab to 8f21503 Compare January 31, 2025 23:32
@martinpitt
Copy link
Contributor

The cockpit test failure from above is essentially this:

# podman pod rm --force --time 0 --all

Error: not all containers could be removed from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: removing pod containers
Error: error removing container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c from pod f99a8d18b9b1cf2c8a4951fcce467057f5477ab385b9eb23d38b912ad93120eb: container 52e324f5fe703a53d4ac0fdfa66fd4914ffb5b7dfdcbc2d3b6d88eccb12b946c has dependent containers which must be removed before it: df56d19ae6e61a5dc779e1e6e1994734e2e490ed0e8769efa2fb48a29e13ce6f: container already exists

This feels related to this change? The latest push passed, but in commit 3aa18df it also passed in F41 and only failed once in Rawhide -- so this feels like a race condition, or possibly file system order etc., i.e. something not reliably reproducible? Does that ring a bell?

@Luap99
Copy link
Member

Luap99 commented Feb 3, 2025

@martinpitt Most of our tests fail as well so yes this patch is broken and cannot be merged like that.

Copy link

Ephemeral COPR build failed. @containers/packit-build please check.

The intention behind this is to stop races between
`pod stop|start` and `container stop|start` being run at the same
time. This could result in containers with no working network
(they join the still-running infra container's netns, which is
then torn down as the infra container is stopped, leaving the
container in an otherwise unused, nonfunctional, orphan netns.

Locking the pod (if present) in the public container start and
stop APIs should be sufficient to stop this.

Signed-off-by: Matt Heon <[email protected]>
@mheon mheon force-pushed the graph_stop branch 2 times, most recently from a080ae2 to 041c6a3 Compare February 3, 2025 16:51
@mheon
Copy link
Member Author

mheon commented Feb 3, 2025

@Luap99 This is ready for review now

@baude
Copy link
Member

baude commented Feb 3, 2025

LGTM

Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, I have a hard time following the graph logic.

For start I understand that it makes sense to not continue starting when the dep failed. But for stop I am not sure. Even if we fail to stop one I feel like we should still stop all the other deps regardless.

The other thing is the locking in traverseNodeInwards(), the amount of special cases an locking /unlocking of node.lock and nodeDetails.lock makes me uneasy but I cannot offer anything better and test pass so I guess I just have to live with it.

infraStopTime, err := time.Parse(timeFormat, infraStop.OutputToString())
Expect(err).ShouldNot(HaveOccurred())

Expect(ctrStopTime.Before(infraStopTime)).To(BeTrue())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use Expect(infraStopTime).To(BeTemporally(">", ctrStopTime)) to provide useful errors when this fails.
https://onsi.github.io/gomega/#betemporallycomparator-string-compareto-timetime-threshold-timeduration

with you code you just get the "expect false to be true error"

Comment on lines 239 to 240
podInspect := podmanTest.PodmanExitCleanly("pod", "inspect", "--format", "{{ .InfraContainerID }}", podName)
infraCtrID := podInspect.OutputToString()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can skip the inspect call when you give a name on create instead with --infra-name

}
}

if len(ctrErrors) > 0 {
if len(ctrErrors) > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change, it seems like we skip reporting errors when one ctr failed?

Comment on lines 428 to 429
ctr.lock.Unlock()

if cleanup {
return ctr.Cleanup(ctx, false)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems fragile locking, i.e. if new early returns are added without unlocking.
And it unnecessarily unlocks to just to immediately lock again. Would it not be better to have an internal version of Cleanup() that does not take the lock.

@Luap99 Luap99 removed the No New Tests Allow PR to proceed without adding regression tests label Feb 4, 2025
First, refactor our existing graph traversal code to improve code
sharing. There still isn't much sharing between inward traversal
(stop, remove) and outward traversal (start) but stop and remove
are sharing most of their code, which seems a positive.

Second, add a new graph-traversal function to stop containers.
We already had start and remove; stop uses the newly-refactored
inward-traversal code which it shares with removal.

Third, rework the shared stop/removal inward-traversal code to
add locking. This allows parallel execution of stop and removal,
which should improve the performance of `podman pod rm` and
retain the performance of `podman pod stop` at about what it is
right now.

Fourth and finally, use the new graph-based stop when possible
to solve unordered stop problems with pods - specifically, the
infra container stopping before application containers, leaving
those containers without a working network.

Fixes https://issues.redhat.com/browse/RHEL-76827

Signed-off-by: Matt Heon <[email protected]>
Copy link
Member

@Luap99 Luap99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, your PR was not rebased on the last push which means I See flakes that have already been fixed on main. Please remember to rebase.

Copy link
Contributor

openshift-ci bot commented Feb 7, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Luap99, mheon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mheon
Copy link
Member Author

mheon commented Feb 7, 2025

@containers/podman-maintainers PTAL and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants