Title :
Taming the beast some thoughts on exascale resiliency
Author_Institution :
Hasso Plattner Inst., Univ. of Potsdam, Potsdam, Germany
Abstract :
The design and operation of high performance computing (HPC) infrastructures is, and always was, a huge technological challenge. Whenever the next generation of HPC system was about to be designed in the past, the community faced an ever-growing number of compute nodes and storage capacity, increasing heterogeneity of software, a new level of nonlinear computational load, questions of energy consumption and cooling, and many other non-functional issues. So far, everybody managed to deal with these issues in a exceptional and creative way. This time, it is about to become really hard.
Keywords :
cooling; energy consumption; parallel programming; power aware computing; storage management; HPC infrastructures; compute nodes; cooling; energy consumption; exascale HPC systems; high performance computing infrastructures; next generation HPC system; nonlinear computational load; software heterogeneity; storage capacity; Correlation; Fault tolerance; Fault tolerant systems; Fault trees; High performance computing; Programming; Uncertainty;
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2013 International Conference on
Conference_Location :
Helsinki
Print_ISBN :
978-1-4799-0836-3
DOI :
10.1109/HPCSim.2013.6641469