• DocumentCode
    2859770
  • Title

    Adaptive checkpointing in dynamic grids for uncertain job durations

  • Author

    Chtepen, Maria ; Dhoedt, Bart ; De Turck, Filip ; Demeester, Piet ; Claeys, Filip H A ; Vanrolleghem, Peter A.

  • Author_Institution
    INTEC-IBBT, Ghent Univ., Ghent, Belgium
  • fYear
    2009
  • fDate
    22-25 June 2009
  • Firstpage
    585
  • Lastpage
    590
  • Abstract
    Adaptive checkpointing is a relatively new approach that is particularly suitable for providing fault-tolerance in dynamic and unstable grid environments. The approach allows for periodic modification of checkpointing intervals at run-time, when additional information becomes available. In this paper an adaptive algorithm, named MeanFailureCP+, is introduced that deals with checkpointing of grid applications with execution times that are unknown a priori. The algorithm modifies its parameters, based on dynamically collected feedback on its performance. Simulation results show that the new algorithm performs even better than adaptive approaches that make use of exact information on job execution times.
  • Keywords
    grid computing; software fault tolerance; MeanFailureCP+; adaptive algorithm; adaptive checkpointing; fault-tolerance; grid computing; uncertain job duration; Adaptive algorithm; Checkpointing; Computational modeling; Computer networks; Fault tolerance; Feedback; Grid computing; Job design; Resource management; Runtime; Grid computing; adaptive checkpointing; fault-tolerance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology Interfaces, 2009. ITI '09. Proceedings of the ITI 2009 31st International Conference on
  • Conference_Location
    Dubrovnik
  • ISSN
    1330-1012
  • Print_ISBN
    978-953-7138-15-8
  • Type

    conf

  • DOI
    10.1109/ITI.2009.5196152
  • Filename
    5196152