• DocumentCode
    3065197
  • Title

    Proactive blocking coordinated checkpointing with dynamic intervals

  • Author

    Lotfi, Mehdi ; Motamedi, Seyed Ahmad ; Bandarabadi, Mojtaba

  • Author_Institution
    Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran
  • fYear
    2009
  • fDate
    15-17 March 2009
  • Firstpage
    118
  • Lastpage
    121
  • Abstract
    In this paper we introduce a new proactive blocking coordinated checkpointing for cluster computing systems with dynamic interval. Many current schemes to increase the availability of cluster computing systems either make use of redundancy in space or redundancy in time (reactive methods). These methods induce the overhead to the cluster computing system in failure free execution time. In order to minimize the performance loss (rollback and checkpoint overheads) due to unexpected failures or unnecessary overhead of fault tolerant mechanisms, we present a proactive method for the blocking coordinated checkpointing strategy. Existing checkpointing methods are static with constant checkpointing interval. These methods are based on the exponential distribution function. In this paper we use the Weibull distribution function to find the dynamic interval. Our method is based on the failure data analysis of LANL cluster system. Experimental results show that average execution time of NAS application is significantly reduced by using the proposed method.
  • Keywords
    Weibull distribution; checkpointing; exponential distribution; program diagnostics; software fault tolerance; software performance evaluation; workstation clusters; Weibull distribution function; cluster computing system; dynamic interval; exponential distribution function; failure free execution time; fault tolerant mechanism; performance loss minimization; proactive blocking coordinated checkpointing; static checkpointing method; Availability; Checkpointing; Clustering algorithms; Data analysis; Fault tolerance; Frequency synchronization; Hardware; Redundancy; Space technology; Weibull distribution; blocking coordinated checkpointing; dynamic interval; proactive checkpointing; weibull distribution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System Theory, 2009. SSST 2009. 41st Southeastern Symposium on
  • Conference_Location
    Tullahoma, TN
  • ISSN
    0094-2898
  • Print_ISBN
    978-1-4244-3324-7
  • Electronic_ISBN
    0094-2898
  • Type

    conf

  • DOI
    10.1109/SSST.2009.4806842
  • Filename
    4806842