Systems

Black server cabinets in large hall Copyright: © IT Center

Cluster Aix-la-Chapelle (CLAIX)

CLAIX stands for Cluster Aix-la-Chapelle and summarizes the various expansion stages of the high-performance computer (HPC system) and data storage at RWTH Aachen University. The current system consists of the CLAIX-2023 expansion stage and several file systems for different task areas.

The technical documentation for the Service RWTH High Performance Computing can be found in our documentation portal IT Center Help.

CLAIX-2023

Following a tender focused on total cost, the company NEC was selected as the supplier for the CLAIX-2023 expansion stage at the end of 2022. As the name suggests, the system is scheduled to go into test operation at the end of 2023 and into unrestricted production operation from 2024.

CLAIX-2023 consists of over 660 compute nodes with 2x Intel Sapphire Rapids processors, each with 48 cores, and 256 to 1024 GB of DDR5 memory. In addition, there are 51 compute nodes of identical base architecture, each equipped with four NVIDIA Hopper H100 GPUs (incl. NVLink) as accelerators and available for special application purposes such as machine learning. For interactive work with the system, CLAIX-2023 also has six dialog systems of compatible architecture. All nodes are connected to an NVIDIA/Mellanox NDR InfiniBand 200 gigabit/s network.

Server cabinet from the front with colored lights

Data storage

Various file systems are provided for storing the data, which differ in terms of the intended usage scenarios.

A highly available GPFS-based storage system from DDN offers a capacity of approx. 4 PiByte and a bandwidth of 80 Gigabyte/s (read and write) and is available as $HOME. The file system supports snapshots for standalone recovery of data after mishandling. This file system is also supported by a disaster-proof backup, which strictly limits the capacity granted to users. The same file system technology without this backup is available with larger grantable capacity as $WORK.

A high-performance Lustre-based storage system from DDN based on Exascaler5 technology provides 26 petabytes of capacity and 500 gigabytes/s of read/write bandwidth and is available as $HPCWORK. The capacity granted to users can be in the petabyte range.

An ad-hoc file system based on BeeGFS technology aggregates the free capacities of the SSDs in the computing nodes for the duration of a computing job. Approximately 400 GiB are available per participating compute node. This file system offers maximum metadata performance at high bandwidth (comparable to $HPCWORK) and also supports the storage of a very high number of small files, especially for AI applications, and is available as $BEEOND.

Operational Strategy RWTH High Performance Computing

The high-performance computer is operated by the IT Center at RWTH Aachen University. According to the 1-cluster concept, the operating concept makes all resources of the cluster available to the users by means of an interface, so that different expansion stages, innovative architectures and data can be used by means of the same processes.