GPU Node: Configuration and Access Guide
The GHPC cluster now includes GPU resources designated exclusively for QGG users. This node is intended for QGG group members who require GPU resources for computationally intensive tasks, such as Deep Learning and AI research. The GPU server was purchased as part of the recent cluster migration to meet the growing demand for advanced research in these fields.
GPU Server Configuration
The GPU server is equipped with the latest hardware to ensure optimal performance for machine learning and AI workloads:
Specification | Details |
---|---|
GPU Model | NVIDIA L40S |
CUDA Cores | 18,176 |
Tensor Cores | 568 |
Memory | 48 GB (GDDR6) |
CUDA Version | 12.2 |
The NVIDIA L40S GPU offers excellent performance for compute-heavy tasks such as training deep learning models and running large-scale simulations.
Accessing the GPU Server
To access the GPU resources, users must be part of Quentin Geissmann's project. Those who are not members but require GPU access should contact the system administrator to request approval.
Accessing the GPU Server Shell
From the login node, use the following command to start an interactive session on the GPU server:
srun -p ghpc_gpu --pty bash
This command will open a bash shell on the GPU server. Users must ensure they use the ghpc_gpu
partition to access GPU nodes, as the default partition does not include GPU resources.
Submitting GPU Jobs via SLURM
To run GPU workloads, users can submit batch jobs using SLURM. Please ensure you read the Running batch jobs page thoroughly before proceeding, as it contains essential information.
Below is a sample batch script that demonstrates how to request GPU resources and run the job.
#!/bin/bash
#--------------------------------------------------------------------------#
# Job Specifications for GPU Usage
#--------------------------------------------------------------------------#
#SBATCH -p ghpc_gpu # Name of the GPU partition
#SBATCH -N 1 # Number of nodes (DO NOT CHANGE)
#SBATCH -n 1 # Number of CPU cores
#SBATCH --mem=1024 # Memory in MiB
#SBATCH --gres=gpu:1 # Request 1 GPU (DO NOT CHANGE)
#SBATCH -J gpu_job # Job name
#SBATCH --output=slurm_%x_%A.out # STDOUT file
#SBATCH --error=slurm_%x_%A.err # STDERR file
#SBATCH -t 1:00:00 # Maximum job runtime (1 hour)
# Temporary directory creation for the job (DO NOT CHANGE)
TMPDIR=/scratch/$USER/$SLURM_JOBID
export TMPDIR
mkdir -p $TMPDIR
#=========================================================================#
# Your GPU Job
#=========================================================================#
# Activate Python virtual environment
source /usr/lib/python3.11/venv/bin/activate
# Confirm that the virtual environment is active
echo "Activated Python Virtual Environment: $(which python)"
# Execute your GPU workload
python my_gpu_script.py # Replace with the actual GPU workload
#=========================================================================#
# Cleanup (DO NOT CHANGE)
#=========================================================================#
cd $SLURM_SUBMIT_DIR
rm -rf /scratch/$USER/$SLURM_JOBID
In the script:
-p ghpc_gpu
specifies the GPU partition.--gres=gpu:1
requests 1 GPU.- The virtual environment located at
/usr/lib/python3.11/venv/bin/activate
is activated before running the GPU job. - Replace
my_gpu_script.py
with the actual Python script for GPU computation.
Submitting the Job
To submit a job to the GPU queue, use the following command:
sbatch my_gpu_job.sh
GPU Server Policy
The GPU server is designated exclusively for members of the QGG group. Users who are not members but require access to the GPU server for research should contact the system administrator to request permission. Unauthorized access is prohibited.
This server was added to the QGG cluster during the migration process to meet the increasing demand for Deep Learning and AI in research. QGG members working in these fields are encouraged to make full use of this powerful resource.
Additional Information
- GPU Server Queue/Partition:
ghpc_gpu
(Always specify this when accessing GPU resources).
For any questions regarding the use of the GPU server, users are encouraged to contact the system administrator.