Resource limits

limit / Queue >GHPCZEN4nav_zen4 nav
Max number of CPU cores that can be requested by a job3212812832
Default number of CPU cores assigned to a job if not specified by user2222
Max amount of memory that can be requested by a job740 GiB1.5 TiB1.5 TiB385 GiB
Default amount of memory assigned to a job if not specified by user11.7 GiB/core11.75 GiB/core11.7 GiB/core11.75 GiB/core

Fair usage limits:

As resourceful as the cluster is, it is unfair for a single user to overwhelm the resource pool att he cost of other users's requests. Hence fair usage limits are put in place. The following limits apply to all users by default. If you reach this limit your further jobs will be made to wait in queue until your prior jobs complete, leaving their occupied resources back to the pool.

Maximum # of CPU cores a user can utilise as part of their running jobs = 72

Maximum amount of memory a user can reserve at any point in time = 768 GiB

Maximum number of jobs a user can have (running + pending) in the system at a time = 144

What if a user hits one of the limits above?

Their jobs will be queued and will get a chance to run only after their currently running jobs relinquish the resources so that the limits could still be satisfied.

For example, if a job is made to wait because a user's memory limit, it would show up like below.

asampath@c07b12:[~] > myst
             JOBID PARTITION     NAME     USER ST     TIME_LIMIT       TIME  NODES  CPUS MIN_MEMORY NODELIST(REASON)
              3945   ghpc        bash asampath PD       12:00:00       0:00      1     1       220G (QOSMaxMemoryPerUser)

How do I know if I hit any of the limits?

myst and squeue commands will clearly state why your jobs are pending and what limits they are waiting to satisfy.

What if I need an exception?

Write an email to your sysadmin and give a convincing reason why you need extra resources.