Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-factor priority plugin: add new "age" factor #291

Open
cmoussa1 opened this issue Oct 12, 2022 · 9 comments
Open

multi-factor priority plugin: add new "age" factor #291

cmoussa1 opened this issue Oct 12, 2022 · 9 comments
Assignees
Labels
idea An idea for a new feature or change

Comments

@cmoussa1
Copy link
Member

Another factor that could be added to the multi-factor priority plugin is an "age" factor, or a factor that represents the amount of time that a job has been sitting in a queue waiting to run. The priority for a job, in general, should increase the longer it sits in a queue eligible to run.

A first thought: Perhaps the t_submit field can be extracted when the job enters job.validate and be set along with the other necessary job information. Then, it can be subtracted from the current time when the priority is being calculated to determine an age, resulting in an age factor that could increase the priority of a job. The age factor should also probably be less than the fair-share factor.

I think this would also work during a reprioritize, which calls job.priority.get for all jobs (by "all" jobs, I'm assuming all jobs that are not currently in RUN state). The priorities for these jobs would be updated to reflect how long they've been scheduled to run.

@cmoussa1 cmoussa1 added the idea An idea for a new feature or change label Oct 12, 2022
@cmoussa1 cmoussa1 self-assigned this Oct 12, 2022
@cmoussa1
Copy link
Member Author

cmoussa1 commented Nov 1, 2022

After playing around with this some yesterday and this morning, I think I am getting closer to a working solution. Since jobtap plugins allow t_submit to be extracted from a job, I can pull this out as a double. During the job priority calculation, t_submit can be subtracted from the current time to get an "age," which can be used to further increase the priority of a job, i.e the longer a job sits in a queue, the greater its "age" becomes, subsequently increasing its integer priority:

priority = round ((fshare_weight * fshare_factor) +
                  (queue_weight * queue_factor) +
                  (age_weight * age_factor) +
                  (urgency - 16));

@cmoussa1
Copy link
Member Author

cmoussa1 commented Nov 1, 2022

As I think about how to test this, I am wondering how it might be possible to test for consistent priority values, which is what I have been doing currently. I guess I could change the sharness tests to just check that the job receives a priority value (and not compare to a specific value), and that's it. The age factor is sure to change the integer priorities that I currently test for, and I'm not sure I can guarantee it stay the same in between runs.

@grondo
Copy link
Contributor

grondo commented Nov 1, 2022

Maybe you'll want to add some configuration that controls whether to include or exclude the age factor in priority. This would allow you to test this factor separately from the others. I'm not sure if the priority plugin has any configuration at the moment, so that might be a bigger project than it at first seems.

As a stopgap, you could add a test-only RPC to the mf_priority module that disables the age in the priority calculation for tests that currently assume non-time-based priorities (this would probably be the easiest solution).

Otherwise, instead of comparing priorities to a specific value, you could ensure they are in an expected range, or are greater than or less than the priority of another job.

Sorry, I don't have any other great ideas at the moment.

@garlick
Copy link
Member

garlick commented Nov 1, 2022

It might be separately useful to have TOML configuration support for flux-accounting. It's easier now that you can get the config through the conf.update jobtap topic. Here's an example.

@grondo
Copy link
Contributor

grondo commented Nov 2, 2022

Good point. If you allow the factor weights to be configurable in TOML config, then you can set the age weight to 0 to eliminate it from the priority, plus allow the weights to be adjusted in configuration during normal use.

@cmoussa1
Copy link
Member Author

cmoussa1 commented Nov 2, 2022

Ah, these are great suggestions, thank you both! I agree that it would be useful to have TOML configuration for the plugin and for flux-accounting. I think I had a couple of vaguely related issues (#128 and #216) currently open that I think I should circle back to once I complete this.

For now, would it be okay if I went with @grondo's suggestion of just checking the integer priorities returned are in an expected range? I think for the most part, most of the tests that do check for an integer priority are just making sure that a priority is returned and are less concerned with the order of jobs.

If so, is there a recommended way you had thought of doing this? Were you thinking of using grep or awk?

EDIT: actually, I think I can accomplish by using grep -c and passing a regular expression to look for a range of values in the file that holds the integer priority. 👍

@grondo
Copy link
Contributor

grondo commented Nov 2, 2022

sure! If the priority is a number you can also just check that it is within a range. It all depends on what you are trying to test.

@cmoussa1
Copy link
Member Author

Beginning to circle back to adding the age as a factor to the priority plugin and had a good offline discussion with @grondo this morning about some of the things to look for. One of the reminders we got a while ago from @watson6282 is that perhaps we should consider the "age" as the time from when a job is released instead of when it is submitted, or in other words, consider only the time that the job was available to be scheduled but failed due to another constraint (like a resource constraint) that could be satisfied if it was the next job up in the queue. This would be to prevent users from purposely submitting held jobs for a long time and then getting a large priority bump when they finally release their held job.

One suggestion from @grondo was perhaps getting the timestamp of the priority event that resulted in a nonzero priority and calculating age that way. It should also probably handle the case where a job's urgency is set to 0 and then set again to a nonzero value.

@garlick / @ryanday36 - do you have any thoughts here on a possible implementation method? Should we consider a job's age as the time when it was released (i.e actually eligible for scheduling)? Or instead just when it is submitted and look to handle the case where an urgency is changed from zero to nonzero?

@ryanday36
Copy link
Contributor

Age should generally be the time that the job was eligible, though I could see a configuration option to count it from job submission being a future request if other sites start running Flux system instances as that is something that Slurm offers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
idea An idea for a new feature or change
Projects
None yet
Development

No branches or pull requests

4 participants