Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc27: support partial hello responses #433

Merged
merged 2 commits into from
Nov 20, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Nov 15, 2024

This adds a partial-ok flag to the hello request, and (if that flag was true) an optional allocated idset key to the response which can indicate which ranks of R are allocated when some of the ranks have already been released.

The problem has been discussed in flux-framework/flux-core#6089

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks straightforward to me, but likely you'll want an ACK from a Fluxion developer as well.

@garlick
Copy link
Member Author

garlick commented Nov 20, 2024

After feedback from @trws, the allocated key is replaced with a free key, which makes more sense in retrospect.

Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Since this got approval from @trws in principle, I think you could now set MWP.

Problem: when resources are partially allocated across a scheduler
reload, the HELLO protocol has no way to inform the scheduler
which resources of the job are allocated or freed.

Add an optional "free" idset key to the response.  If the key
is present, the ranks in the idset should be considered free.

See also: flux-framework/flux-core#6089
Problem: there is no way for the scheduler to indicate that it
can handle the new 'free' key in hello responses.

The scheduler can put {"partial-ok":true} in the hello request
to indicate this.
@garlick garlick force-pushed the rfc27_partial_hello branch from 98e84e6 to b7b1284 Compare November 20, 2024 17:54
@mergify mergify bot merged commit 8f966ce into flux-framework:master Nov 20, 2024
7 checks passed
@garlick garlick deleted the rfc27_partial_hello branch November 20, 2024 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants