*** Wartungsfenster jeden ersten Mittwoch vormittag im Monat ***

Skip to content
Snippets Groups Projects
Muck, Katrin's avatar
Muck, Katrin authored
corrected cpus-per-task arguments (cpus means logical cpus in our slurm configuration)

See merge request !2
297d33f1
History

Example job scripts for VSC Slurm

The purpose of this repository is to have a set of slurm job scripts with expected results.

This way we have example we can give to users as a starting point as well as have something to test our lua implementation against.

Explanations

Exclusive/Shared Nodes (OverSubscribe)

Have a node exclusively means that the user can use all resources there are.

Sharing a node means that multiple users will execute jobs on the same node.

For an actual job this can be queried by executing

$ scontrol show job 1030747
JobId=1030747 JobName=test-vsc5-zen3_0512_a100x2-single.sh
   ...
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   ...

The OverSubscribe flag can have the following values

  • NO: Job allocation reserves all node resources exclusively for the job
  • OK: Job allocation wont share node resources but not all resources are reserved
  • USER: Job allocation cannot share node resources with other user's jobs
  • MPS: Job allocation cannot share node hardware with other user's jobs that also require shared hardware (?)
  • YES: Nodes the job runs on will be shared with other jobs

To completely get all resources in a vanilla slurm one has to use -N 1 --exclusive --mem=0 otherwise you may not get the full memory.

In VSC slurm the lua scripts automatically sets shared/oversubscribe=0 so a full node allocation (-N) is always exclusive to the job executing.

For 'partial' allocations (no -N specified) the shared flag is not touched - instead the SingleCore feature will be set so that shared jobs go to separate designated nodes that expose the SingleCore feature.