The QBiG GPU cluster is funded by the DFG in the framework of the CRC 110. It consists of three parts. The most recent addition QBiG-III consists of 2 nodes with 8 NVIDIA A100 GPUs. QBiG-II added 5 nodes with 8 NVIDIA P100 cards each over QBiG-I. QBiG-I had a peak performance of about 180 TFlops in double and about 373 TFlops in single precision. QBiG-I has a peak performance of 56 TFlops in double and 168 TFlops in single precision on 48 K20m GPUs.
The fast infiniband network allows the users for multi GPU and multi-node programme execution. QBiG is connected to 190 TByte of RAID disk storage using a Lustre filesystem.
In addition we provide some CPU only nodes
The cluster can be accessed via the frontend node ‘qbig.cluster.hiskp’ from within the HISKP VPN network only. Connect using ssh to ‘qbig.itkp.uni-bonn.de’. Every user has a directory on the frontend node in ‘/hiskp4/username’. The latter FS is a lustre FS available via infiniband on all compute nodes.
Please note that the frontend node is for compiling and development only, so please do not run production jobs on qbig interactively. A few CPU slots are available on the frontend node, which can be used.
There are two MPI libraries installed, openMPI and MVAPICH2. Both can handle Infiniband, but only the latter is compiled with GPU direct support. However, only with openMPI I managed to get hybrid MPI+openMP jobs running. MVAPICH2 is the standard you will get when invoking mpicc and mpirun. If you want to use openMPI you need to use mpicc.openmpi and mpirun.openmpi instead. Unfortunately, currently the man pages refer to MVAPICH2 only.
This means in particular that you need to recompile your application for either openMPI or MVAPICH2. Therefore, you have to compile with mpicc.openmpi if you want to run a hybrid MPI+openMP application.
Batch queuing is done using SLURM. Most important commands are sbatch for submitting a job, squeue for listing jobs in the queue and scancel for cancelling a job.
Maximal walltime allowed is currently 36 hours. Default is one hour.
The default memory requirement is set to one GB. Please specify the momory requirements as precisely as possible! The limit is checked strictly and jobs will be aborted.
#!/bin/bash -x
#SBATCH --job-name=my-job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --output=%x.%J.out
#SBATCH --error=%x.%J.out
#SBATCH --time=36:00:00
#SBATCH --mail-user=me@hiskp.uni-bonn.de
#SBATCH --mail-type=ALL
#SBATCH --mem=1500M
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export KMP_AFFINITY=balanced,granularity=fine,verbose
cd /hiskp4/username/run-dir/
srun path-to-exec/executable
cd -
#!/bin/bash -x
#SBATCH --job-name=gpujob
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=4
#SBATCH --output=%x.%J.out
#SBATCH --error=%x.%J.out
#SBATCH --time=01:00:00
#SBATCH --mail-user=me@hiskp.uni-bonn.de
#SBATCH --mail-type=ALL
#SBATCH --gres=gpu:4
#SBATCH --mem=1G
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export KMP_AFFINITY=balanced,granularity=fine,verbose
cd /hiskp4/username/run-dir/
srun path-to-exec/executable
cd -