plugin: enforce max resource limits across an association's running jobs #559

cmoussa1 · 2025-01-07T22:34:28Z

Creating a tracking issue here to outline the idea for enforcing a max number of resources used across an association's set of running jobs. I already have a couple of open issues similar to this but it would probably be useful to re-organize some thoughts after some helpful offline discussion.

The need here is to be able to limit how many resources (e.g nodes, cores) an association can have at any given time across all of their running jobs. As noted in flux-config-policy(5), the limit checks take place before the scheduler sees the request because [the plugin] does not have detailed resource information.

So, it seems a realistic solution here would be to configure a max resources limit that is both a max nodes and a max cores limit. The priority plugin should be able to keep track of both when a job enters RUN state by looking at the jobspec. It can increment/decrement current node and core counts per-association across all of their running jobs. Then, when a submitted job enters DEPEND state, the job's size can be checked to see if adding its resources to the association's currently allocated resources would put them over the max (i.e either over the nodes or cores limits). If so, the job can be held until a currently running job exits.

There are a couple of prerequisites to get this kind of support into flux-accounting:

Tasks

Give feedback

add max_cores column to association_table
send max_cores information to priority plugin during bulk update
association_table: add max_cores attribute, send information to plugin #560

database merge-when-passing new feature plugin
create cur_nodes, cur_cores members for Association class
track resource usage per-job for the association who submitted the job
create check for a submitted job to see if it would put association over either one of their resource limits
add/remove dependencies on held jobs due to max resource usage
Options

I've done some playing around today with a rough sketch and it looks like the first four tasks listed are pretty straightforward; copying over the jj code from flux-core, I'm able to extract job size counts and add/subtract them from an association's cur_nodes and cur_cores attributes as jobs enter RUN and INACTIVE states.

I'll plan to start opening incremental PRs to add this kind of support into flux-accounting.

The text was updated successfully, but these errors were encountered:

cmoussa1 · 2025-01-08T17:37:19Z

Had a helpful offline discussion with @ryanday36 about a possible implementation plan for how this might work in the priority plugin:

The priority plugin will have max_nodes, max_cores, cur_nodes, and cur_cores information stored per-association in its internal map. This information will be able to be queried with flux jobtap query to see where an association is at at any given time.

When a job proceeds to job.state.run, its resource information will be extracted from jobspec. It will use the jj code to count both nnodes and ncores and increment the association's cur_nodes and cur_cores count accordingly.

As jobs get submitted and are running, subsequently submitted jobs will have their resource counts checked in job.state.depend. If the resource counts (nnodes or ncores) would put the association over either their max_nodes or max_cores limit, the job will have an accounting-specific dependency added to it describing that the association has hit their max resources limit, and the job will be held.

Jobs will be held until a currently running job transitions to INACTIVE. When the running job transitions to INACTIVE, its resources will again be extracted from jobspec and decremented from the association's cur_nodes and cur_cores count. Then, when the association's cur_running_jobs count is checked to ensure that they are allowed to have a running job at this moment, the held job's resource count (I need to see if I can retrieve a jobspec in a jobtap plugin with just the jobid??) will be checked to ensure that the association would not be over their max. If not, the job can be released and proceed to RUN.

cmoussa1 added feature tracking Tracking issue for larger feature made up of smaller issues plugin related to the multi-factor priority plugin labels Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plugin: enforce max resource limits across an association's running jobs #559

plugin: enforce max resource limits across an association's running jobs #559

cmoussa1 commented Jan 7, 2025 •

edited

Loading

Tasks

cmoussa1 commented Jan 8, 2025

plugin: enforce max resource limits across an association's running jobs #559

plugin: enforce max resource limits across an association's running jobs #559

Comments

cmoussa1 commented Jan 7, 2025 • edited Loading

Tasks

cmoussa1 commented Jan 8, 2025

cmoussa1 commented Jan 7, 2025 •

edited

Loading