DocumentCode
3469622
Title
Locks and barriers in checkpointing and recovery
Author
Badrinath, Ramamurthy ; Morin, Christine
Author_Institution
CSE Dept., Indian Inst. of Technol., Kharagpur, India
fYear
2004
fDate
19-22 April 2004
Firstpage
459
Lastpage
466
Abstract
Dependency tracking between communicating tasks is an important concept in backward error recovery for parallel applications. One can extend the traditional dependence tracking model for message passing systems to track dependencies between shared memory and task private states for shared memory applications. The objective of this paper is to analyze the issues generated by locks and barriers in parallel applications so that we can checkpoint tasks at any time (even when holding or waiting for locks and barriers). In particular we attempt to extend earlier dependency tracking mechanisms to locks and barriers. We address both coordinated and uncoordinated checkpointing schemes.
Keywords
fault tolerant computing; message passing; parallel programming; shared memory systems; system recovery; workstation clusters; backward error recovery; barriers; communicating tasks; coordinated checkpointing; dependency tracking; locks; message passing systems; parallel applications; private states; shared memory; system recovery; uncoordinated checkpointing; Checkpointing; Context modeling; Fault detection; Fault tolerance; Fault tolerant systems; Grid computing; Hardware; Kernel; Message passing; Protocols;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International Symposium on
Print_ISBN
0-7803-8430-X
Type
conf
DOI
10.1109/CCGrid.2004.1336601
Filename
1336601
Link To Document