The future of high performance computing @ PIK

In 2023, the institute's high-performance computing (HPC) system - in operation since July 2015 without a single unplanned down-time - will be replaced by a new system that will provide about five times the computing capacity of its predecessor.

Overview

we are pleased to announce that the EU-wide tender process for the replacement of the current high performance computing (HPC) system of the institute, which started almost a year ago with the qualification of bidders, has been successfully completed and a contract for the delivery of a new HPC system has been signed with pro-com Datensysteme GmbH.

The new system will be built almost exclusively based on the very latest computer technologies, in particular the recently announced Lenovo  Neptune direct watercooled hpc server systems, equipped with the latest generations of  AMD "Genoa" CPUs and NVIDIA "Hopper" graphics processors;  IBM Elastic Storage ESS systems and TS4500 Tape Libraries  and last but not least "next data rate" NVIDIA Quantum-2 Infiniband network gear.

Installation of the new system will take place in several stages, starting in December 2022 with the delivery of management and storage systems. IT-facilities, in particular the new direct water cooling system, which will permit the highly efficient use of up to 40°C warm water for cooling, will be prepared in early 2023. This work is expected to take about two weeks and is the only installation phase in which the current HPC system will have to be taken out of service for an extended period of time.

Installation of the CPU-partition for general purpose scientific modeling and numerical simulations shall be completed by May 2023 and will be followed - depending on the availability of the new graphic processors - by the installation of a GPU-partition for machine learning applications. The final acceptance phase of the new system is currently planned for September 2023.

Compared to the HPC system in use at the institute today, the new system is expected to provide about five times more throughput for calculations and file operations and double the available on-line file storage capacity - an additional, semi-automatic storage layer for cold data, which will be used for the first time for HPC systems of the institute, not yet being accounted for. High performance random access memory capacity will increase seven-fold. Based on data provided by the vendor, average power consumption will only raise two-fold, which indicates that the overall efficiency of the new HPC system will increase significantly as well.

The current HPC system will continue to operate throughout most parts of the installation stages, except during the installation of the new direct water cooling, the migration of user data and during the final performance tests.

Hardware capabilities of the new system

  1. CPU partition with 30.720 AMD EPYC 9554 "Genoa" CPU cores @ 3.1 GHz base clock,
                                 a total of 180 TByte DDR5 RAM @ 4800 MT/s - 6 GB RAM per CPU core;

  2. GPU partition with 8 x 4 NVIDIA "Hopper" GPU with 2 CPU and 1.5 TByte RAM per system;

  3. High performance interconnect based on NVIDIA Infiniband NDR200 and NDR technology;

  4. A fast NVMe storage tier with about 600 TByte net capacity;

  5. A capacity storage tier with about 8.000 TByte net capacity;

  6. Two tape storage libraries with an initial capacity of  30-60* Petabyte each for cold data, backup and disaster recovery.
* 30 PByte tape cartridges installed, storage slots for 30PByte additional cartridges available.

System maintenance for a minimum of four years after acceptance

Acknowledgements

Directors and staff of PIK are grateful to the Land Brandenburg and in particular the The Ministry of Science, Research and Culture for funding this important investment, which will enable the Institute to conduct its research at the highest level well into the year 2030.