Example job scripts for VSC Slurm
The purpose of this repository is to have a set of slurm job scripts with expected results.
This way we have example we can give to users as a starting point as well as have something to test our lua implementation against.
Explanations
Exclusive/Shared Nodes (OverSubscribe)
Have a node exclusively means that the user can use all resources there are.
Sharing a node means that multiple users will execute jobs on the same node.
For an actual job this can be queried by executing
$ scontrol show job 1030747
JobId=1030747 JobName=test-vsc5-zen3_0512_a100x2-single.sh
...
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
...
The OverSubscribe
flag can have the following values
- NO: Job allocation reserves all node resources exclusively for the job
- OK: Job allocation wont share node resources but not all resources are reserved
- USER: Job allocation cannot share node resources with other user's jobs
- MPS: Job allocation cannot share node hardware with other user's jobs that also require shared hardware (?)
- YES: Nodes the job runs on will be shared with other jobs
To completely get all resources in a vanilla slurm one has to use -N 1 --exclusive --mem=0
otherwise you may not get the full memory.
In VSC slurm the lua scripts automatically sets shared/oversubscribe=0
so a full node allocation (-N) is always exclusive to the job executing.
For 'partial' allocations (no -N
specified) the shared flag is not touched - instead the SingleCore
feature will be set so that shared jobs go to separate designated nodes that expose the SingleCore
feature.