EuroHPC: Traditional approaches are running out of steam!

"Long-term society’s willingness to invest in HPC will depend on our ability to solve the most important societal and scientific problems, not in the ability to execute a subset of calculations that scale well to a billion of cores in a single run"

To solve challenges on a global scale, we need access to extraordinary computing capacity but we also need to build expertise, develop new approaches, explore new applications, etc. EuroHPC, based in Luxembourg, aims to achieve this in the near-term future. Its objective is to establish European leadership in HPC by jointly procuring exascale & pre-exascale supercomputers, managing open calls for application development and funding research and education collaboration across Europe.

At the end of March, Ariana, David & Sagar from SURF Innovation Lab joined the EuroHPC summit in Sweden, and we are happy to share that our impression of the event was very positive! We learned European strategic priorities and met new experts & policy thinkers. We were able to discuss the initiatives of SURF and the Netherlands, with the most crucial players in the European HPC scene.  

The summit included discussion on various topics (it was a several-days event after all!). The event began with talks about the priorities in Europe for the coming years. These priorities included some of the obvious suspects like Exascale Computing, Hyperconnectivity and Federation, but also innovative topics like Quantum HPC integration, Technology sovereignty, Heterogenous Computing and Energy Efficiency of large-scale research infrastructure. And most importantly, the funding of scientific applications & ecosystem development, e.g. skills & expertise across European boundaries & cultures. We also represented SURF in Open Workshop: Collaboration across Centers of Excellence & Competence Centres.  

We can proudly say that we are already working in many of these areas at SURF! A significant emphasis was also the need to focus on applications and their evolving complexity and conditions; the importance of open software and hardware stacks but also the need of “laboratories” for innovative, experimental and disruptive technology development (FYI at SURF, we are already working on one!).  

Quantum computing 

Quantum computing is becoming a more and more common topic in the HPC circles. At the summit, the talks ranged from the Swedish quantum technology strategy: they envisioned updates (aiming for a >100 qubits computer in the coming years!) to current devices and technological challenges and the importance of integrating HPC and the present (noisy) quantum computers. With the current devices, no quantum advantage is expected, but from the HPC perspective we need to start the connection and integration of this disruptive technology into our models and software stacks. The integration of quantum computer to HPC is important to: 

1 - increase uptake of quantum computing,  

2 - increase performance of quantum computers, and  

3 - provide accessibility to quantum computers by reusing existing HPC infrastructure.  

RISC-V ecosystem: 

For those of you who do not know, RISC-V is an instruction set architecture (ISA) rooted in reduced instruction set computer (RISC) principles. RISC-V is unique, even revolutionary, because it is a common, free, open-source ISA to which software can be ported, hardware can be developed, and processors can be built to support it.  Currently, commonly used ISAs are proprietary and require a license and fees for their use. Hence, RISC-V is very important for the democratization and autonomy of technology. The focus of RISC-V is on open standards, not open source (a more detailed blog post about this will come soon!). Until now, RISC-V has mostly been focusing on the industrial ecosystem but there is a strong movement in creating more synergy between the RISC-V and (Euro)HPC.  

Exascale computing: 

One way to measure the performance speed of a computer is using the concept of “floating-point operations per second (FLOPS)”, where “operation” refers to simple arithmetic operations like addition or multiplication, involving a number containing a decimal, like 8.9.  

Normally, a person can solve such an operation in one second—that’s 1 FLOPS but computers can solve these operations much faster. For example, a typical laptop is capable of a few teraFLOPS, or a trillion operations per second and the first supercomputer could do 3,000,000 FLOPS (3 mega FLOPS). “Exa” means 18 zeros.. Hence, an exascale computer can perform more than 1,000,000,000,000,000,000 FLOPS, or 1 exaFLOPS.  

So far EuroHPC had a strong focus in acquisition and hosting of such systems (one is to be hosted in Germany, the second one still to be determined). In the summit, there was a big emphasis in the importance of an energy driven design but also the need to develop applications and software stacks that are hardware efficient and can properly leverage these systems. Overall, the focus seems to be moving from acquisition of systems to the usage of them.  

Sustainable computing 

Sustainable computing is yet another fascinating topic that is becoming more and more important in the field of HPC. Some interesting remarks came from the comparison from an energy point of view of data movement vs data computing: memory access consumes x10-x100 the energy of an arithmetic operation!  

With the growing increase of workloads that are dominated by data movement (e.g. AI), it is becoming more important to start looking in the direction of data centric architectures. In-memory computing is such an architecture. The idea is to perform both data storage and data computing in the memory networks (non-Von Neuman architecture). For the development of this hardware, RISC-V might play an important role. 

Yet another interesting aspect regarding energy is the high cost of operating an HPC. The price for 1Mwh varies a lot! it can go from 25-111euros just in the EU area.... (For a 10MW system this means a difference of 2M euros - 9M euros per year!). Thus, the location of HPC centers matters orders of magnitude more than system and operational level optimization for reducing operational costs. Hourly rates can also vary tremendously (even 10x variations!). End-users should be steered to favor lower energy prices (normally night times and weekends).  

In general, HPC should focus on optimizing: 

1) CO2 emission,  

2) operational costs through optimization of system utilization and energy costs,   

3) science per watt through architectural changes and  

4) energy saving through operation and application choices 

Unconventional HPC + Heterogeneity  

“We are approaching the end of Moore’s law!” 

“Traditional approaches are running out of steam!” 

“Specialized computing is riding the next wave of computer architecture!” 

These were some of the headlines shared during presentations in the summit and very well summarize the coming HPC trends to move towards 1) specialized architectures and accelerators, 2) increase locality of data, 3) non von-Neuman architectures and 4) unconventional models of compute.  

New codes are combined in complex workflows and traditional barriers are dissolving. The question is now: how do we manage this heterogeneity? How do we integrate the unconventional resource?  

From a system architecture perspective, we need to make sure we enable the integration of new technologies, e.g via modular supercomputing architecture (MSA). From resource management, we need dynamic orchestration and malleability. From the software stack perspective, we need to standardize programming models, smart compilers, runtime and workflow systems, debugging and performance tools. Finally, from the applications perspective, novel workflows, mathematical formulations and new features need to be developed.  

In general, bringing heterogeneity and new technologies to the table opens many opportunities, but also brings several challenges: the difficulty to efficiently share resources and maximize their utilization, difficulty to identify sources of performances loss and optimization, the programmability of the different components and the difficulty in maintaining portability among them and the increasing complexity of the workflows. On the opportunities side: better energy efficiency, the ability to select the ideal hardware to do each part of the application and a wider range of technology providers (away from monopolistic scenarios), to name a few. 

Auteur

Reacties

Dit artikel heeft 0 reacties

Gerelateerde artikelen