Other SLURM tasks

Displaying resources available in the cluster

When running jobs, it might become relevant to check what resources are available in teh cluster and request resources according to availability. Your sysadmin created an alias to easily get a summary of resources available in GHPC - ghpcinfo. It aliases the slurm command sinfo to provide information in an easy format.

root@console1 ~]# ghpcinfo
PARTITION    AVAIL TIMELIMIT    CPUS(A/I/O/T)  S:C:T    FREE_MEM       NODELIST
nav          up    45-12:00:00  34/158/0/192   2:8:2    81917-214799   sky[001-005,014]
ghpc_v1      up    45-12:00:00  356/26/50/432  2:6:2    46580-242633   has[705-708,710-712,802-803,805-806,902-908]
ghpc_v2      up    45-12:00:00  110/50/32/192  2:8:2    71025-361632   sky[006-009,012-013]
ghpc_v3      up    45-12:00:00  0/256/0/256    2:8:2    769057-769449  cas[1-8]

Where,

there are four queues namely - ghpc_v1(default), ghpc_v2, ghpc_v3 and nav.

CPUS(A/I/O/T) stands for Nodes (Active/Idle/Other/Total).

S:C:T stands for sockets:CPUs:Threads. 2:6:2 indicates the server has 2 sockets, each with 6 core CPUs and each core has 2 hyperthreads, totaling as 24 logical CPUs per server.

Canceling a job

Sometimes, you need to cancel a job that was submitted by mistake or with wrong specs etc.

Check your jobs using myst alias and find the job number that you want to cancel. Then, cancel it using scancelcommand.

asampath@console1:[~] > myst
             JOBID PARTITION     NAME     USER ST       TIME  NODES  CPUS MIN_MEMORY NODELIST(REASON)
               556       nav template asampath  R       0:07      1     2        10G c09b03.ghpc.au.dk
               557       nav template asampath  R       0:04      1     2        10G c09b03.ghpc.au.dk
asampath@console1:[~] > scancel 556
asampath@console1:[~] > myst
             JOBID PARTITION     NAME     USER ST       TIME  NODES  CPUS MIN_MEMORY NODELIST(REASON)
               557       nav template asampath  R       0:11      1     2        10G c09b03.ghpc.au.dk

Modifying Time limit of a running/pending job

Slurm does not allow users to increase the time limits of running jobs. If you submitted a job with a run time of x hours, and realize that perhaps it might need x+2 hours to finish, you can email your sysadmin requesting to increase the time limit of the job at any time. Do not expect immediate response. But, if you're lucky and your sysadmin read your email at that time, it can be done easily.

when you send an email, specify the job and time limit in text form, and do NOT send screenshots.

Moving a job to the top of your queue

At times, you may have a need to make a job run with higher priority than the rest of your jobs already waiting in queue. You can move any job to the top of your queue using..

scontrol top <jobid>

Resubmit a job

If you'd like to resubmit a job with same parameters,

scontrol requeue <jobid>