HPC3 Cluster User Guide & FAQ

General

1. Am I eligible to apply account to access the HPC3 cluster for research computing?

The cluster is open to all approved university researchers and the application must be sponsored by a principal investigator (PI), a faculty of the university. The PI needs to apply for an account too if he/she would like to access the cluster. Please refer to HPC3 cluster website on what resources are available and how to apply for the account.

2. How do I log into the HPC3 cluster?

You can access the hpc3.ust.hk through Secure Shell (SSH) command (e.g. ssh dummyuser@hpc3.ust.hk) or any SSH Client (like Bitvise or PuTTY). On PC Windows platform, you can use the free SSH client like PuTTY. You need to use campus network or Wi-Fi (SSID: eduroam) to access the login node. In case you are from off-campus, you can access the login-node via VPN (Virtual Private Network).

3. How do I transfer files to or from the cluster?

You can use SFTP (SSH File Transfer Protocol) client such as FileZilla on PC Windows to do the file transfer.

4. How many jobs and nodes I can run and use in the cluster?

Compute node has processors, memory and local disk as resources. The resources limit in the cluster is based on CPU core only for allocation and the usage quota is group based to be shared among the members of a group (PI). For your usage quota, please refer to the Cluster Resource Limits for details depending on which PI group you are with.

5. What is my disk quota?

The default disk quota for each user is 100GB for /home and the parallel file system scratch disk / archive storage depends on each PI and it is shared among the members of a group (details of the storage system please refer to the Cluster Storage Page
To check your parallel file system - /scratch/PI/<pi_group> disk usage, use command beegfs-ctl --getquota --gid <pi_group> 
To check your home file system - /home disk usage, use command quota

6. How do I run jobs in the cluster?

You can compile and develop your application in hpc3.ust.hk. You have to submit jobs using SLURM which is the resource management and job scheduling system in the cluster to run. Details can found in the Job Scheduling System page.

Slurm

7. How do I check my submitted job status?

You can use the command squeue –u $USER to check your job status. The output field of ST (job state) shows the job status. The typical states are R (running), PD (Pending), CG (Completing).

8. What is meaning of the quota limit of GrpJobs, GrpNodes, GrpsubmitJobs and partition WallTime in Cluster Resource Limits webpage?

The GrpJobs is the total number of jobs able to run at any given time for a PI group in a partition. The GrpNodes is the total number of nodes able to be used at any given time for a PI group in a partition. The GrpsubmitJobs is total number of jobs able to be submitted, running and waiting, to the system at any given time for a PI group. The maximum WallTime is the maximum run time limit for jobs in the partition.

9. Why my job is pending while there is idle node?

A possible scenario is that one or more higher priority jobs exist in the partition.

For example, there is only one idle node while a high priority job asking for 2 nodes is submitted, and is pending.  When a low job asking for 1 node is submitted, it will still be put in pending state as there exists a pending higher priority job (earlier submitted jobs asking 2 nodes). The priority is related to the submission time.

Software

10. Can I install software in the cluster?

In general, you can install software in your own /home/$USER directory or in the /scratch/PI/<pi_group> directory. Please note that you are responsible for the licenses and copyright of the software you install in the cluster. You should also adhere to ITSC’s Acceptable Use Policy.

11. How to install tensorflow-gpu in HPC3 cluster?

Use Anaconda to install tensorflow-gpu in the environment:

    module load anaconda3

    conda create -n my_env

    source activate my_env

    conda install -c anaconda tensorflow-gpu

12. How to use Singularity in HPC3 cluster?

Use Singularity to pull a image from Docker Hub then run the image

    singularity pull docker://sylabsio/lol  cow

    singularity run lolcow_latest.sif

    Reference:

    https://docs.sylabs.io/guides/3.8/user-guide/index.html

    https://docs.sylabs.io/guides/3.8/user-guide/cli/singularity_pull.html

    https://docs.sylabs.io/guides/3.8/user-guide/cli/singularity_run.html

13. How to setup and run python in HPC3 cluster?

You are advised to use anaconda to manage python environment (eg. python 3.11). 

    module load anaconda3

    conda create -n my_env

    source activate my_env

    conda install python=3.11

    python --version

14. How to run application program with GPU in HPC3 cluster?
  • For development and testing purposes, you can have access to xgpu.ust.hk, which is a login node with GPU device for HPC3 and X-GPU clusters. You can use Secure Shell (SSH) client to connect to xgpu.ust.hk. Note that campus wired network or Wi-Fi (SSID: eduroam) or VPN (in case you are out of campus) is required for the connection.
  • For SLURM job submission, you need to explicitly declare the total number of GPU devices to use in the script using a line as follows:

#SBATCH -N number_of_node -n number_of_CPU_cores --gres=gpu:number_of_GPU_devices

For example, the following line declares to use 4 CPU cores and 2 GPU devices in 1 node:
#SBATCH -N 1 -n 4 --gres=gpu:2

If you don’t declare the number of GPU devices to use with option “--gres”, SLURM would NOT allocate any GPU device for the job and the application would NOT be able to find any GPU device available. For more details on sample job submission script, you may refer to Example 3 on this page.