Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove 16 core options from choose_slurmConfig() #1537

Open
0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q opened this issue Feb 1, 2024 · 7 comments
Open
Assignees
Labels
code cleaning Code that could/should be cleaned up

Comments

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member

Using 16 over twelve cores does not yield any benefit and is therefore removed from the defaults (#1536). It would be only consistent to also remove the option from choose_slurmConfig().

modes <- c(" 1: SLURM standby 12 nash H12 [recommended]",
" 2: SLURM standby 13 nash H12 coupled",
" 3: SLURM standby 16 nash H12+",
" 4: SLURM standby 1 nash debug, testOneRegi, quick",
"-----------------------------------------------------------------------",
" 5: SLURM priority 12 nash H12 [recommended]",
" 6: SLURM priority 13 nash H12 coupled",
" 7: SLURM priority 16 nash H12+",
" 8: SLURM priority 1 nash debug, testOneRegi, quick",
"-----------------------------------------------------------------------",
" 9: SLURM short 12 nash H12",
"10: SLURM short 16 nash H12+",
"11: SLURM short 1 nash debug, testOneRegi, quick",
"12: SLURM medium 1 negishi",
"13: SLURM long 1 negishi",
"-----------------------------------------------------------------------",
"14: SLURM medium 12 nash - long calibration",
"15: SLURM medium 16 nash - long calibration",
"-----------------------------------------------------------------------",
"16: direct, without SLURM")

@orichters
Copy link
Contributor

orichters commented Feb 5, 2024

  • As start_bundle_coupled.R does not use this function but always runs on 13 core for nash runs, also the 13 core options can be removed, I think.
  • For coupled runs, the script identifies itself how many nodes are advisable (13 for nash mode, 1 for test and negishi).
  • Also, we have an auto option that checks if a priority slot is free and takes priority then, and short otherwise (also does that for later runs in the cascade, when they are started)
  • Maybe we can simply restrict the user choice to auto (recommended), priority, standby, short, medium, long, direct?

@0UmfHxcvx5J7JoaOhFSs5mncnisTJJ6q
Copy link
Member Author

Maybe we can simply restrict the user choice to auto (recommended), priority, standby, short, medium, long, direct?

In my opinion, "direct" should not be an option. On the cluster, computation loads should always go to the compute nodes, not the login nodes. Locally, the entire thing is bypassed anyhow.

@dklein-pik
Copy link
Contributor

  • Maybe we can simply restrict the user choice to auto (recommended), priority, standby, short, medium, long, direct?

One thing is the "quality of service" (qos) (i.e. priority, ...), the other is the --tasks-per-node setting. As you wrote, for coupled runs this is determined automatically, but I could not find something similar for standalone runs. If we let the user only choose the qos, we need to add code that determines the number of tasks automatically. Did I get you right, and do you agree?

@orichters
Copy link
Contributor

orichters commented Feb 6, 2024

Yes, fully agree, that part of the code is missing yet. We also have to keep in mind that some files in config/tests use their own special slurm settings (particularly --wait).

@orichters
Copy link
Contributor

As noticed by @robertpietzcker, make test currently requires a priority slot which can be annoying. We should replace that by auto, I think, once that is supported.

@orichters
Copy link
Contributor

We might want to allow the user also to select limiting the slurm time, useful in particular in case of upcoming cluster closures...

@orichters
Copy link
Contributor

We might want to allow the user also to select limiting the slurm time, useful in particular in case of upcoming cluster closures...

... which should also be used to limit the AMT runtimes on standby to a reasonable length below 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code cleaning Code that could/should be cleaned up
Projects
None yet
Development

No branches or pull requests

4 participants