A Banking/Resource Allocation (Service Unit) tracking system for the SLURM job scheduler based upon slurm_bank created by Barry Moore (2017).
Developed on python 3.6.8. Eventual plan to update popen for python 3.7.
- Why?
- How?
- Prerequisites
- Accounts and Associations
- Setup
- Usage
- Checking (Cron)
- Dumping the DB
- Useful SLURM commands
We needed a banking system for SLURM, which is simple and robust - Barry Moore's slurm_bank met this criteria, work was undertaken to build upon it to create this project.
In this version python3 updates have been made and email notifications from the program itself are currently removed.
Why are email notifications removed? We plan to use another (external) system to keep track of project proposal end date and to email upon thresholds.
A Python program is used and data stored in an sqlite file.
Using the existing associations in your SLURM database, we use the RawUsage
from sshare
to monitor service units (CPU hours) on the cluster. From the documentation:
Raw Usage
The number of cpu-seconds of all the jobs that charged the account by the user.
This number will decay over time when PriorityDecayHalfLife is defined.
PriorityDecayHalfLife
This controls how long prior resource use is considered in determining how
over- or under-serviced an association is (user, bank account and cluster) in
determining job priority. The record of usage will be decayed over time, with
half of the original value cleared at age PriorityDecayHalfLife. If set to 0 no
decay will be applied. This is helpful if you want to enforce hard time limits
per association. If set to 0 PriorityUsageResetPeriod must be set to some
interval.
Therefore, in your Slurm configuration you will need:
PriorityDecayHalfLife=0-00:00:00 #No decay will be applied. This is helpful if you want to enforce hard time limits per association.
PriorityUsageResetPeriod=NONE #Never clear historic usage. The default value.
AccountingStorageEnforce=associations,limits,qos,safe #If you don't set the configuration parameters that begin with "AccountingStorage" then accounting information will not be referenced or recorded
The slurm_bank.py
takes care of resetting SLURM'S RawUsage for you upon the account in question. The bank has
two limits:
-
A service unit limit: How many compute hours is an account allowed to use?
--ENFORCED by default (typically via cron script). This is the primary use case and reason for the program. -
A project date limit: How long does the proposal last?
--NOT ENFORCED (typically via cron script), but capability is present, minus emailing. We plan to manage this elsewhere.
Other:
- The bank's three month check (check 90 days before project end) is dormant here. Again, we plan to check externally.
- Upper and lower SU check limits are defined, these don't result in an email but do result in DB value change. Again, we plan to mail externally.
- Python3 (tested on 3.6.8). Requirements file for pip3 included.
- dataset: "databases for lazy people"
- docopt: "command line arguments parser, that will make you smile"
- datafreeze: Dump (freeze) SQL query results from a database. As per https://dataset.readthedocs.io/en/latest/api.html datafreeze is a seperate module to dataset - See Data Export section.
- SMTP: NOT required, we plan to use external mechanism for any notifications
sqlite
fordb_print.sh
script- SLURM: tested with 19x
In your SLURM configuration is envisaged you will form a tree where multiple users are associated with an account (project) e.g.:
Account User RawShares NormShares RawUsage EffectvUsage FairShare
-------------------------------------------------------------------------------------------------
test1 parent 0.025000 2686197 0.999999
test1 user1 parent 0.025000 0 0.000000 0.545455
test1 user2 parent 0.025000 2587994 0.963441 0.545455
test1 user3 parent 0.025000 98202 0.036558 0.545455
Above we see the test1 account has user members user{1..3}. Usage by submitted user jobs on the test1 account will propogate/accumulate, and in this example it'll be test1's SUs in the bank/DB that will be compared to the overall RawUsage stored by SLURM accounting.
In project-centric regime, it is assumed you will provide SUs at the project level of the tree. Your tree may look something like:
Physics - example Department or Organisation or even a sublevel of those e.g. Project category
|
test1 - Project (owned by PI) < set SU's against this entity/account
|
User1
User2
User3
- Clone this repo/code on the SLURM master node. e.g. into a new directory, e.g. /etc/slurm_bank
- Make ownership and user of program the SLURM user (not root!).
- py_sb_settings.py is used to set the bank's behaviour and file locations for the python code.
- env.sh is used primarily to setup vars for
slurm_bank_cron.sh
cron checks. It also is used by thedb_print.sh
script.
In SLURM you will need to setup billing per partition (slurm.conf) e.g. within partition definition:
Example compute:
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=0.0"
Example GPU:
TRESBillingWeights="CPU=1.0,Mem=0.25G,GRES/gpu=1.0"
Here, CPU=1.0 means 1 service unit per hour to use 1 core and GRES/gpu=1.0 means 1 service unit per hour to use 1 GPU card.
After setup of py_sb_settings.py
and env.sh
...
Typically most operations will take place through slurm_bank_cron.sh
cron checks.
slurm_bank.py
is used to manage/view SU balances for accounts stored in the DB and to release (account exceeded SUs).
db_print.sh
is a simple script that'll quickly tell you what's going on overall by printing the entire DB table. Also consult the cron logs.
An account will be held if RawUsage exceeds the SUs in the bank DB.
If the account is held in SLURM you'll see an entry in the GrpTRESMins column e.g.:
Account User RawShares NormShares RawUsage EffectvUsage FairShare GrpTRESMins
------------------------------------------------------------------------------------------------------------------------
test1 parent 0.025000 2686197 0.999999 cpu=0
test1 user1 parent 0.025000 0 0.000000 0.545455
test1 user2 parent 0.025000 2587994 0.963441 0.545455
test1 user3 parent 0.025000 98202 0.036558 0.545455
To add an account and SUs you simply execute slurm_bank.py
e.g.
./slurm_bank.py insert test1 10000
Querying immediately after would look like this:
./slurm_bank.py get_sus test1
Account test1 has 10000 SUs
The resultant DB entry would look like this:
1|test1|10000|2022-04-08|0|0|0
The script slurm_bank_cron.sh
will perform a check of Service Units by looping through all SLURM accounts - it is anticipated you'd run this at very least daily. If an account has exhausted it's SUs that account will be held. The mechanism to hold we will use is by setting the account's GrpTRESMins to 0 in SLURM to hold the account. This can be changed in py_sb_settings.py
You can dump the DB to JSON and subsequently repopulate it. On repopulating a backup JSON dump is now taken to a fixed path - the path is set in py_sb_settings.py
Additionally you can dump to CSV, but JSON is currently required to repopulate the sqlite DB, which is required for operation of the bank.
See the tree of accounts and show GrpTRESMins to see if any are held. You may wish to also consider using where account=projZZZZ
sacctmgr show assoc tree -o format=account,user,share,GrpTRESMins
See RawUsage and Share information for accounts. Also show GrpTRESMins.
sshare -a -o Account,User,RawShares,NormShares,RawUsage,EffectvUsage,FairShare,GrpTRESMins
Billing rate for running job
scontrol show job <jobID> | grep -i billing
Billing rate for completed job
sacct -X --format=AllocTRES%80,Elapsed -j <jobID>
This tool prints out the Slurm associations limits and current usage values for a user and may be worth including in your deployment:
https://github.com/OleHolmNielsen/Slurm_tools/tree/master/showuserlimits