DocumentCode
1992478
Title
Transparent and Autonomic Rollback-Recovery in Cluster Systems
Author
Maloney, Andrew ; Goscinski, Andrzej
Author_Institution
Sch. of Eng. & Inf. Technol., Deakin Univ., Geelong, VIC, Australia
fYear
2008
fDate
8-10 Dec. 2008
Firstpage
541
Lastpage
548
Abstract
Cluster systems provide an excellent environment to run computation hungry applications. However, due to being created using commodity components they are prone to failures. To overcome these failures we propose to use rollback-recovery, which consists of the checkpointing and recovery facilities. Checkpointing facilities have been the focus of many previous studies; however, the recovery facilities have been overlooked. This paper focuses on the requirements, concept and architecture of recovery facilities. The synthesized fault tolerant system was implemented in the GENESIS system and evaluated. The results show that the synthesized system is efficient and scalable.
Keywords
checkpointing; fault tolerant computing; workstation clusters; GENESIS system; checkpointing; cluster system rollback-recovery; fault tolerant system; recovery facilities; Application software; Australia; Checkpointing; Computer applications; Concurrent computing; Distributed computing; Fault tolerance; Fault tolerant systems; Information technology; Operating systems; Cluster systems; Fault tolerance; Rollback-recovery;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Systems, 2008. ICPADS '08. 14th IEEE International Conference on
Conference_Location
Melbourne, VIC
ISSN
1521-9097
Print_ISBN
978-0-7695-3434-3
Type
conf
DOI
10.1109/ICPADS.2008.117
Filename
4724363
Link To Document