• DocumentCode
    3469622
  • Title

    Locks and barriers in checkpointing and recovery

  • Author

    Badrinath, Ramamurthy ; Morin, Christine

  • Author_Institution
    CSE Dept., Indian Inst. of Technol., Kharagpur, India
  • fYear
    2004
  • fDate
    19-22 April 2004
  • Firstpage
    459
  • Lastpage
    466
  • Abstract
    Dependency tracking between communicating tasks is an important concept in backward error recovery for parallel applications. One can extend the traditional dependence tracking model for message passing systems to track dependencies between shared memory and task private states for shared memory applications. The objective of this paper is to analyze the issues generated by locks and barriers in parallel applications so that we can checkpoint tasks at any time (even when holding or waiting for locks and barriers). In particular we attempt to extend earlier dependency tracking mechanisms to locks and barriers. We address both coordinated and uncoordinated checkpointing schemes.
  • Keywords
    fault tolerant computing; message passing; parallel programming; shared memory systems; system recovery; workstation clusters; backward error recovery; barriers; communicating tasks; coordinated checkpointing; dependency tracking; locks; message passing systems; parallel applications; private states; shared memory; system recovery; uncoordinated checkpointing; Checkpointing; Context modeling; Fault detection; Fault tolerance; Fault tolerant systems; Grid computing; Hardware; Kernel; Message passing; Protocols;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing and the Grid, 2004. CCGrid 2004. IEEE International Symposium on
  • Print_ISBN
    0-7803-8430-X
  • Type

    conf

  • DOI
    10.1109/CCGrid.2004.1336601
  • Filename
    1336601