Other SLURM tasks
Displaying resources available in the cluster
When running jobs, it might become relevant to check what resources are available in teh cluster and request resources according to availability. Your sysadmin created an alias to easily get a summary of resources available in GHPC - ghpcinfo
. It aliases the slurm command sinfo to provide information in an easy format.
# ghpcinfo
PARTITION AVAIL TIMELIMIT CPUS(A/I/O/T) S:C:T FREE_MEM NODELIST
zen4 up 45-12:00:00 460/692/0/1152 2:32:2 49704-1118296 epyc[01-09]
nav_zen4 up 45-12:00:00 14/242/0/256 2:32:2 1350332-152040 epyc[10-11]
ghpc up 45-12:00:00 592/1008/0/160 2:8+:2 49704-1118296 cas[1-8],epyc[01-09],sky[006-008,011-013]
ghpc_short up 1-01:00:00 8/24/0/32 2:8:2 357323 sky008
nav up 45-12:00:00 14/434/0/448 2:8+:2 302291-1520405 epyc[10-11],sky[001-005,014]
Where,
there are four queues namely - ghpc(default), zen4, ghpc_short, nav and nav_zen4.
CPUS(A/I/O/T) stands for Nodes (Active/Idle/Other/Total).
S:C:T
stands for sockets:CPUs:Threads
. 2:32:2
indicates the server has 2 sockets, each with 32 core CPUs and each core has 2 hyperthreads, totaling as 128 logical CPUs per server.
Canceling a job
Sometimes, you need to cancel a job that was submitted by mistake or with wrong specs etc.
Check your jobs using myst
alias and find the job number that you want to cancel. Then, cancel it using scancel
command.
asampath@console1:[~] > myst
JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON)
556 nav template asampath R 0:07 1 2 10G c09b03.ghpc.au.dk
557 nav template asampath R 0:04 1 2 10G c09b03.ghpc.au.dk
asampath@console1:[~] > scancel 556
asampath@console1:[~] > myst
JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON)
557 nav template asampath R 0:11 1 2 10G c09b03.ghpc.au.dk
Modifying Time limit of a running/pending job
Slurm does not allow users to increase the time limits of running jobs. If you submitted a job with a run time of x hours, and realize that perhaps it might need x+2 hours to finish, you can email your sysadmin requesting to increase the time limit of the job at any time. Do not expect immediate response. But, if you're lucky and your sysadmin read your email at that time, it can be done easily.
when you send an email, specify the job and time limit in text form, and do NOT send screenshots.
Moving a job to the top of your queue
At times, you may have a need to make a job run with higher priority than the rest of your jobs already waiting in queue. You can move any job to the top of your queue using..
scontrol top <jobid>
Resubmit a job
If you'd like to resubmit a job with same parameters,
scontrol requeue <jobid>