DocumentCode :
2956221
Title :
Resilience Challenges for Exascale Systems
Author :
Jouppi, Norman P.
fYear :
2009
fDate :
7-9 Oct. 2009
Firstpage :
379
Lastpage :
379
Abstract :
The combination of decreasing device reliability due to deep submicron scaling, increasing integration, and the size of future exascale high-performance computers and cloud datacenters pose significant challenges for system resilience. Furthermore, with power and cost being of critical importance, resilience must be provided efficiently and economically. Although providing resilience will require a range of approaches at all levels of the system stack, the final responsibility rests at the system level. In addition to highlighting challenges, this talk reviews and introduces promising system-level techniques such as configurable isolation, duplication caching, multicore DIMMs, CoVeRT, and 3D checkpointing.
Keywords :
computer centres; scaling circuits; semiconductor device reliability; 3D checkpointing; CoVeRT; cloud datacenters; configurable isolation; deep submicron scaling; device reliability; duplication caching; exascale high-performance computers; multicore DIMM; system level techniques; system resilience; CMOS technology; Cloud computing; Computer architecture; Fault tolerant systems; Microprocessors; Power system reliability; Resilience; Technological innovation; Timing; Very large scale integration; Resilience; checkpointing; duplication; exascale systems; isolation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Defect and Fault Tolerance in VLSI Systems, 2009. DFT '09. 24th IEEE International Symposium on
Conference_Location :
Chicago, IL
ISSN :
1550-5774
Print_ISBN :
978-0-7695-3839-6
Type :
conf
DOI :
10.1109/DFT.2009.52
Filename :
5372234
Link To Document :
بازگشت