Supercomputer architecture

Configuration of Irene

The compute nodes are gathered in partitions according to their hardware characteristics (CPU architecture, amount of RAM, presence of GPU, etc). A partition is a set of identical nodes that can be targeted to host one or several jobs. Choosing the right partition for a job depends on code prerequisites in term of hardware resources. For example, executing a code designed to be GPU accelerated requires a partition with GPU nodes.

The Irene supercomputer offers three different kind of nodes: regular compute nodes, KNL nodes, large memory nodes and GPU nodes.

  • Skylake nodes for regular computation
    • Partition name: skylake
    • CPU : 2x24-cores Intel Skylake@2.7GHz (AVX512)
    • Cores/Node: 48
    • Nodes: 1 653
    • Total cores: 79 344
    • RAM/Node: 180GB
    • RAM/Core: 3.75GB
  • KNL nodes for regular computation
    • Partition name: knl
    • CPUs: 1x68-cores Intel KNL@1.4GHz
    • Cores/Node: 64 (With 4 additional cores reserved for the operating system. They are referenced by the scheduler but not taken into account for hours consumption.)
    • Nodes: 824
    • Total cores : 56 032
    • RAM/Node: 88GB
    • RAM/Core: 1.3GB
    • Cluster mode is set to quadrant
    • MCDRAM (Multi-Channel Dynamic Random Access Memory) is set as a last-level cache (cache mode)
  • AMD Rome nodes for regular computation
    • Partition name : Rome
    • CPUs: 2x64 AMD Rome@2.6Ghz (AVX2)
    • Core/Node: 128
    • Nodes: 2286
    • Total core: 292 608
    • RAM/Node: 228GB
    • RAM/core : 1.8GB
  • Hybrid nodes for GPU computing and graphical usage
    • Partition name: hybrid
    • CPUs: 2x24-cores Intel Skylake@2.7GHz (AVX2)
    • GPUs: 1x Nvidia Pascal P100
    • Cores/Node: 48
    • Nodes: 20
    • Total cores: 960
    • RAM/Node: 180GB
    • RAM/Core: 3.75GB
    • I/O: 1 HDD 250 GB + 1 SSD 800 GB/NVMe
  • Fat nodes with a lot of shared memory for computation lasting a reasonable amount of time and using no more than one node
    • Partition name: xlarge
    • CPUs: 4x28-cores Intel Skylake@2.1GHz
    • GPUs: 1x Nvidia Pascal P100
    • Cores/Node: 112
    • Nodes: 5
    • Total cores: 560
    • RAM/Node: 3TB
    • RAM/Core: 27GB
    • IO: 2 HDD de 1 TB + 1 SSD 1600 GB/NVMe
  • V100 nodes for GPU computing and AI
    • Partition name: V100
    • CPUs: 2x20-cores Intel Cascadelake@2.1GHz (AVX512)
    • GPUs: 4x Nvidia Tesla V100
    • Cores/Node: 40
    • Nodes: 32
    • Total cores: 1280 (+ 128 GPU)
    • RAM/Node: 175 GB
    • RAM/Core: 4.4 GB
  • V100l nodes for GPU computing and AI
    • Partition name: V100
    • CPUs: 2x18-cores Intel Cascadelake@2.6GHz (AVX512)
    • GPUs: 1x Nvidia Tesla V100
    • Cores/Node: 36
    • Nodes: 30
    • Total cores: 1080 (+ 30 GPU)
    • RAM/Node: 355 GB
    • RAM/Core: 9.9 GB
  • V100xl nodes for GPU computing and AI
    • Partition name: V100
    • CPUs: 4x18-cores Intel Cascadelake@2.6GHz (AVX512)
    • GPUs: 1x Nvidia Tesla V100
    • Cores/Node: 72
    • Nodes: 2
    • Total cores: 144 (+ 30 GPU)
    • RAM/Node: 2.9 TB
    • RAM/Core: 40 GB
  • ARM A64FX for regular computation
    • Partition name : A64FX
    • CPUs : 1x48 A64FX Armv8.2-A SVE @1.8Ghz
    • Core/Node : 48
    • Nodes : 80
    • Total core : 3840
    • RAM/Node : 32GB
    • RAM/core : 666MB

Note that depending on the computing share owned by the partner you are attached to, you may not have access to all the partitions. You can check on which partition(s) your project has allocated hours thanks to the command ccc_myproject.

ccc_mpinfo displays the available partitions/queues that can be used on a job.

$ ccc_mpinfo
                      --------------CPUS------------  -------------NODES------------
PARTITION    STATUS   TOTAL   DOWN    USED    FREE    TOTAL   DOWN    USED    FREE     MpC   CpN SpN CpS TpC
---------    ------   ------  ------  ------  ------  ------  ------  ------  ------   ----- --- --- --- ---
skylake      up         9960       0    9773     187     249       0     248       1    4500  40   2  20   1
xlarge       up          192       0     192       0       3       0       3       0   48000  64   4  16   1
hybrid       up          140       0      56      84       5       0       2       3    8892  28   2  14   1
v100         up          120       0       0     120       3       0       0       3    9100  40   2  20   1
  • MpC : amount of memory per core
  • CpN : number of cores per node
  • SpN : number of sockets per node
  • Cps : number of cores per socket
  • TpC : number of threads per core This allows for SMT (Simultaneous Multithreading, as hyperthreading for Intel architecture)

Interconnect

The compute nodes are connected through a EDR InfiniBand network in a pruned FAT tree topology. This high throughput and low latency network is used for I/O and communications among nodes of the supercomputer.

Lustre

Lustre is a type of parallel distributed file system, commonly used for large-scale cluster computing. It actually relies on a set of multiple I/O servers and the Lustre software presents them as a single unified filesystem.

The major Lustre components are the MDS and OSSs. The MDS stores metadata such as file names, directories, access permissions, and file layout. It is not actually involved in any I/O operations. The actual data is stored on the OSSs. Note that one single file can be stored on several OSSs which is one of the benefits of Lustre when working with large files.

Lustre

Lustre

More information on how Lustre works and best practices are described in Lustre best practice.