GPU Node: Configuration and Access Guide

The GHPC cluster now includes GPU resources designated exclusively for QGG users. This node is intended for QGG group members who require GPU resources for computationally intensive tasks, such as Deep Learning and AI research. The GPU server was purchased as part of the recent cluster migration to meet the growing demand for advanced research in these fields.

GPU Server Configuration

The GPU server is equipped with the latest hardware to ensure optimal performance for machine learning and AI workloads:

Specification Details
GPU Model NVIDIA L40S
CUDA Cores 18,176
Tensor Cores 568
Memory 48 GB (GDDR6)
CUDA Version 12.2

The NVIDIA L40S GPU offers excellent performance for compute-heavy tasks such as training deep learning models and running large-scale simulations.

Accessing the GPU Server

To access the GPU resources, users must be part of Quentin Geissmann's project. Those who are not members but require GPU access should contact the system administrator to request approval.

Accessing the GPU Server Shell

From the login node, use the following command to start an interactive session on the GPU server:

srun -p ghpc_gpu --pty bash

This command will open a bash shell on the GPU server. Users must ensure they use the ghpc_gpu partition to access GPU nodes, as the default partition does not include GPU resources.

Submitting GPU Jobs via SLURM

To run GPU workloads, users can submit batch jobs using SLURM. Please ensure you read the Running batch jobs page thoroughly before proceeding, as it contains essential information.

Below is a sample batch script that demonstrates how to request GPU resources and run the job.

#!/bin/bash
#--------------------------------------------------------------------------#
# Job Specifications for GPU Usage
#--------------------------------------------------------------------------#
#SBATCH -p ghpc_gpu            # Name of the GPU partition
#SBATCH -N 1                   # Number of nodes (DO NOT CHANGE)
#SBATCH -n 1                   # Number of CPU cores
#SBATCH --mem=1024             # Memory in MiB 
#SBATCH --gres=gpu:1           # Request 1 GPU (DO NOT CHANGE)
#SBATCH -J gpu_job             # Job name
#SBATCH --output=slurm_%x_%A.out   # STDOUT file
#SBATCH --error=slurm_%x_%A.err    # STDERR file
#SBATCH -t 1:00:00             # Maximum job runtime (1 hour)

# Temporary directory creation for the job (DO NOT CHANGE)
TMPDIR=/scratch/$USER/$SLURM_JOBID
export TMPDIR
mkdir -p $TMPDIR

#=========================================================================#
# Your GPU Job
#=========================================================================#

# Activate Python virtual environment
source /usr/lib/python3.11/venv/bin/activate

# Confirm that the virtual environment is active
echo "Activated Python Virtual Environment: $(which python)"

# Execute your GPU workload
python my_gpu_script.py  # Replace with the actual GPU workload

#=========================================================================#
# Cleanup (DO NOT CHANGE)
#=========================================================================#
cd $SLURM_SUBMIT_DIR
rm -rf /scratch/$USER/$SLURM_JOBID

In the script:

  • -p ghpc_gpu specifies the GPU partition.
  • --gres=gpu:1 requests 1 GPU.
  • The virtual environment located at /usr/lib/python3.11/venv/bin/activate is activated before running the GPU job.
  • Replace my_gpu_script.py with the actual Python script for GPU computation.

Submitting the Job

To submit a job to the GPU queue, use the following command:

sbatch my_gpu_job.sh

GPU Server Policy

The GPU server is designated exclusively for members of the QGG group. Users who are not members but require access to the GPU server for research should contact the system administrator to request permission. Unauthorized access is prohibited.

This server was added to the QGG cluster during the migration process to meet the increasing demand for Deep Learning and AI in research. QGG members working in these fields are encouraged to make full use of this powerful resource.

Additional Information

  • GPU Server Queue/Partition: ghpc_gpu (Always specify this when accessing GPU resources).

For any questions regarding the use of the GPU server, users are encouraged to contact the system administrator.