• DocumentCode
    80194
  • Title

    Reliability-Aware Speedup Models for Parallel Applications with Coordinated Checkpointing/Restart

  • Author

    Ziming Zheng ; Li Yu ; Zhiling Lan

  • Author_Institution
    Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA
  • Volume
    64
  • Issue
    5
  • fYear
    2015
  • fDate
    May 1 2015
  • Firstpage
    1402
  • Lastpage
    1415
  • Abstract
    Speedup models are powerful analytical tools for evaluating and predicting the performance of parallel applications. Unfortunately, the well-known speedup models like Amdahl´s law and Gustafson´s law do not take reliability into consideration and therefore cannot accurately account for application performance in the presence of failures. In this study, we enhance Amdahl´s law and Gustafson´s law by considering the impact of failures and the effect of coordinated checkpointing/restart. Unlike existing analytical studies relying on Exponential failure distribution alone, in this work we consider both Exponential and Weibull failure distributions in the construction of our reliability-aware speedup models. The derived reliability-aware models are validated through trace-based simulations under a variety of parameter settings. Our trace-based simulations demonstrate these models can effectively quantify failure impact on application speedup. Moreover, we present two case studies to illustrate the use of these reliability-aware speedup models.
  • Keywords
    Weibull distribution; checkpointing; exponential distribution; parallel processing; Weibull failure distributions; coordinated checkpointing/restart; exponential failure distribution; parallel applications; reliability-aware speedup models; trace-based simulations; Analytical models; Checkpointing; Computational modeling; Exponential distribution; Mathematical model; Reliability; Weibull distribution; Amdahl???s law; Gustafson???s law; Speedup; analytical modeling; reliability;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2014.2317182
  • Filename
    6798722