• DocumentCode
    1992478
  • Title

    Transparent and Autonomic Rollback-Recovery in Cluster Systems

  • Author

    Maloney, Andrew ; Goscinski, Andrzej

  • Author_Institution
    Sch. of Eng. & Inf. Technol., Deakin Univ., Geelong, VIC, Australia
  • fYear
    2008
  • fDate
    8-10 Dec. 2008
  • Firstpage
    541
  • Lastpage
    548
  • Abstract
    Cluster systems provide an excellent environment to run computation hungry applications. However, due to being created using commodity components they are prone to failures. To overcome these failures we propose to use rollback-recovery, which consists of the checkpointing and recovery facilities. Checkpointing facilities have been the focus of many previous studies; however, the recovery facilities have been overlooked. This paper focuses on the requirements, concept and architecture of recovery facilities. The synthesized fault tolerant system was implemented in the GENESIS system and evaluated. The results show that the synthesized system is efficient and scalable.
  • Keywords
    checkpointing; fault tolerant computing; workstation clusters; GENESIS system; checkpointing; cluster system rollback-recovery; fault tolerant system; recovery facilities; Application software; Australia; Checkpointing; Computer applications; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Information technology; Operating systems; Cluster systems; Fault tolerance; Rollback-recovery;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • ISSN
    1521-9097
  • Print_ISBN
    978-0-7695-3434-3
  • Type

    conf

  • DOI
    10.1109/ICPADS.2008.117
  • Filename
    4724363