Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs. The available partitions are normal, large, and cpu
. You can run sinfo
to find the available list of partitions in discovery.
Resource Request Policy
- Computational resource in HKUST SuperPOD is requested in units of H800 (80GB) GPU that each GPU is associated with the default CPU cores and system memory in Slurm as below:
- 14 CPU cores with 28 Threads
- 224GB system memory
- In general, we recommend that users just specify the
--gpus
parameter for requested number of GPUs and--nodes
parameter for required number of nodes in a job request and let Slurm allocate the cores and memory among the nodes for the optimized resource utilization. - For
normal
partition, it supports job request for mainstream GPU computation that varies from a single H800 GPU and up to 16 GPUs in maximum. - For
large
partition, it supports large job request for multi-nodes that the request unit must be in multiple of 8 H800 GPUs i.e. a full node, The minimum number of requested nodes is 2 (16 GPUs) and up to 12 nodes (96 GPUs) in maximum. - The number of nodes assigning to large and normal partitions may vary depending on the different workload condition.
- For job request on very large number of nodes, i.e. large than 12 nodes, such request must be arranged by reservation only,
Partition Table
Slurm Partition | large | normal | cpu |
---|---|---|---|
No. of nodes |
35 DGX nodes |
20 DGX nodes |
2 Intel nodes |
Purpose |
For large scale GPU computation with multi-nodes |
For mainstream GPU computation |
Data pre-processing for GPU computation |
Max Wall Time |
3 days |
3 days |
12 hours |
Min resource requested per job |
16 GPUs (or equivalent to 2 nodes) |
1 GPU |
1 CPU core |
Max resource requested per account |
96 GPUs (or equivalent to 12 nodes) |
16 GPUs |
8 CPU cores (per job) |
Concurrent running jobs quota per user |
4 |
8 |
28 |
Queuing and running jobs limit per user |
5 |
10 |
28 |
Chargeable |
Yes |
Yes |
No |
Interactive job |
Allow one session with maximum 2 hours wall time |
Allow one session with maximum 2 hours wall time |
Not Allow |
Remarks |
GPU resources must be requested in multiple of 8 (full node) |
GPU resources can be requested in any quantity not more than max |
No access to the /scratch directory for the time being |