HKUST SuperPOD - Apptainer (Singularity)

The Apptainer container (compatible with Singularity in image format and command line level) provides users with the capability to execute applications within their preferred Linux environment. By encapsulating both the operating system and application stack into a container image file, it enables seamless modification, replication, and transferability across systems where Apptainer (or Singularity) is installed. This image file can be utilized as a user application, leveraging the native resources of the host system, such as infiniband network, GPU/accelerators, and resource manager integration. Apptainer, in essence, facilitates the concept of Bring-Your-Own-Environment (BYOE) computing within a multi-tenant and shared High-Performance Computing (HPC) cluster.

Users can use apptainer and singularity in an interchangable way in all below examples.

Workflow

The above figure from official documentation describes the typical workflow to use Apptainer. In general, there are 2 stages in the workflow: build container in user endpoint and execute container in production environment. Typically, user endpoint refers to local systems where you have admin/root privilege, e.g, your desktop/virtual machine, while production environment refers to shared environment where you only have user privilege, e.g, Any HPC cluster.

In the first stage, you need to build a customized container by installing applications and modifying configuration if applicable. With the prepared container image, you can upload it to your home directory in the cluster to start stage 2. In this stage, you can treat the container as an application. Similar to other user application, you can submit job to execute the container in compute node via SLURM.

 

Stage 1

Load apptainer into your shell environment with "module load apptainer"

Stage 2

Suppose you have downloaded the TensorFlow models to your home directory in SuperPOD as below.

git clone https://github.com/tensorflow/models.git

Test the container to train a model on the MNIST handwritten digit data set using a GPU device.

apptainer exec --nv tensorflow-latest-gpu.sif python ./models/tutorials/image/mnist/convolutional.py

The "--nv" option allows the container to access the GPU devices in the node with the Nvidia driver installed.

To run jobs with SLURM, you can put the above statement in your SLURM job submission script.

 

Notes:

Test your container before job submission. You can test it in interactive mode or batch mode. It is important to test if the container works as you expect, especially when your application utilizes GPU

Reference Website: 

Apptainer Quick Start - https://apptainer.org/docs/user/main/quick_start.html