DocumentCode :
3092198
Title :
Exploiting operating system services to efficiently checkpoint parallel applications in GENESIS
Author :
Rough, Justin ; Goscinski, Andrzej
Author_Institution :
Sch. of Comput. & Math., Deakin Univ., Geelong, Vic., Australia
fYear :
2002
fDate :
23-25 Oct. 2002
Firstpage :
261
Lastpage :
268
Abstract :
Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.
Keywords :
network operating systems; software fault tolerance; software performance evaluation; system recovery; workstation clusters; GENESIS; cluster operating system; coordinated checkpointing facility; high execution performance; low overheads; nondedicated clusters; operating system services; parallel applications; parallelism management; transparent access; Application software; Australia; Checkpointing; Concurrent computing; Mathematics; Operating systems; Parallel processing; Programming profession; Resource management; Scalability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Algorithms and Architectures for Parallel Processing, 2002. Proceedings. Fifth International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7695-1512-6
Type :
conf
DOI :
10.1109/ICAPP.2002.1173584
Filename :
1173584
Link To Document :
بازگشت