Partitions are work queues that have a set of rules/policies and computational nodes included in it to run the jobs. The available partitions are normal, large, and cpu. You can run sinfo to find the available list of partitions in discovery.
Resource Request Policy
- Computational resource in HKUST SuperPOD is requested in units of H800 (80GB) GPU that each GPU is associated with the default CPU cores and system memory in Slurm as below:
- 14 CPU cores with 28 Threads
- 224GB system memory
- In general, we recommend that users just specify the
--gpusparameter for requested number of GPUs and--nodesparameter for required number of nodes in a job request and let Slurm allocate the cores and memory among the nodes for the optimized resource utilization. - For
normalpartition, it supports job request for mainstream GPU computation that varies from a single H800 GPU and up to 16 GPUs in maximum. - For
largepartition, it supports large job request for multi-nodes that the request unit must be in multiple of 8 H800 GPUs i.e. a full node, The minimum number of requested nodes is 2 (16 GPUs) and up to 12 nodes (96 GPUs) in maximum. - The number of nodes assigning to large and normal partitions may vary depending on the different workload condition.
- For job request on very large number of nodes, i.e. large than 12 nodes, such request must be arranged by reservation only,
Partition Table
| Slurm Partition | large | normal | cpu |
|---|---|---|---|
|
No. of nodes |
35 DGX nodes |
20 DGX nodes |
2 Intel nodes |
|
Purpose |
For large scale GPU computation with multi-nodes |
For mainstream GPU computation |
Data pre-processing for GPU computation |
|
Max Wall Time |
3 days |
3 days |
12 hours |
|
Min resource requested per job |
16 GPUs (or equivalent to 2 nodes) |
1 GPU |
1 CPU core |
|
Max resource requested per account |
96 GPUs (or equivalent to 12 nodes) |
16 GPUs |
8 CPU cores (per job) |
|
Concurrent running jobs quota per user |
4 |
8 |
28 |
|
Queuing and running jobs limit per user |
5 |
10 |
28 |
|
Chargeable |
Yes |
Yes |
No |
|
Interactive job |
Allow one session with maximum 2 hours wall time |
Allow one session with maximum 2 hours wall time |
Not Allow |
|
Remarks |
GPU resources must be requested in multiple of 8 (full node) |
GPU resources can be requested in any quantity not more than max |
No access to the /scratch directory for the time being |