• DocumentCode
    244403
  • Title

    Coarse-Grained Energy Modeling of Rollback/Recovery Mechanisms

  • Author

    Ibtesham, Dewan ; DeBonis, David ; Arnold, Dorian ; Ferreira, Kurt B.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of New Mexico, Albuquerque, NM, USA
  • fYear
    2014
  • fDate
    23-26 June 2014
  • Firstpage
    708
  • Lastpage
    713
  • Abstract
    As high-performance computing systems continue to grow in size and complexity, energy efficiency and reliability have emerged as first-order concerns. Researchers have shown that data movement is a significant contributing factor to power consumption on these systems. Additionally, rollback/recovery protocols like checkpoint/restart can generate large volumes of data traffic exacerbating the energy and power concerns. In this work, we show that a coarse-grained model can be used effectively to speculate about the energy footprints of rollback/recovery protocols. Using our validated model, we evaluate the energy footprint of checkpoint compression, a method that incurs higher computational demand to reduce data volumes and data traffic. Specifically, we show that while checkpoint compression leads to more frequent checkpoints (as per the optimal checkpoint frequency) and increases per checkpoint energy cost, compression still yields a decrease in total application energy consumption due to the overall runtime decrease.
  • Keywords
    checkpointing; energy conservation; parallel processing; power consumption; protocols; software reliability; checkpoint compression; checkpoint energy cost; coarse-grained energy modeling; coarse-grained model; computational demand; data movement; data traffic; data volumes; energy consumption; energy efficiency; energy footprints; energy reliability; first-order concern; high-performance computing system; optimal checkpoint frequency; power consumption; rollback/recovery mechanisms; rollback/recovery protocol; runtime decrease; Energy consumption; Energy measurement; Optimization; Power measurement; Predictive models; Protocols; Time measurement; Checkpoint Compression; Checkpoint Restart; Fault Tolerance; Modeling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
  • Conference_Location
    Atlanta, GA
  • Type

    conf

  • DOI
    10.1109/DSN.2014.71
  • Filename
    6903629