DocumentCode
244403
Title
Coarse-Grained Energy Modeling of Rollback/Recovery Mechanisms
Author
Ibtesham, Dewan ; DeBonis, David ; Arnold, Dorian ; Ferreira, Kurt B.
Author_Institution
Dept. of Comput. Sci., Univ. of New Mexico, Albuquerque, NM, USA
fYear
2014
fDate
23-26 June 2014
Firstpage
708
Lastpage
713
Abstract
As high-performance computing systems continue to grow in size and complexity, energy efficiency and reliability have emerged as first-order concerns. Researchers have shown that data movement is a significant contributing factor to power consumption on these systems. Additionally, rollback/recovery protocols like checkpoint/restart can generate large volumes of data traffic exacerbating the energy and power concerns. In this work, we show that a coarse-grained model can be used effectively to speculate about the energy footprints of rollback/recovery protocols. Using our validated model, we evaluate the energy footprint of checkpoint compression, a method that incurs higher computational demand to reduce data volumes and data traffic. Specifically, we show that while checkpoint compression leads to more frequent checkpoints (as per the optimal checkpoint frequency) and increases per checkpoint energy cost, compression still yields a decrease in total application energy consumption due to the overall runtime decrease.
Keywords
checkpointing; energy conservation; parallel processing; power consumption; protocols; software reliability; checkpoint compression; checkpoint energy cost; coarse-grained energy modeling; coarse-grained model; computational demand; data movement; data traffic; data volumes; energy consumption; energy efficiency; energy footprints; energy reliability; first-order concern; high-performance computing system; optimal checkpoint frequency; power consumption; rollback/recovery mechanisms; rollback/recovery protocol; runtime decrease; Energy consumption; Energy measurement; Optimization; Power measurement; Predictive models; Protocols; Time measurement; Checkpoint Compression; Checkpoint Restart; Fault Tolerance; Modeling;
fLanguage
English
Publisher
ieee
Conference_Titel
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/DSN.2014.71
Filename
6903629
Link To Document