DocumentCode
2055632
Title
Building reliable distributed programs with file operations
Author
Ouyang, Jinsong ; Maheshwari, Piyush
Author_Institution
Dept. of Comput. Sci., Toronto Univ., Ont., Canada
fYear
1997
fDate
18-21 Dec 1997
Firstpage
380
Lastpage
385
Abstract
Describes a new protocol that helps the user in building reliable distributed applications with file operations. Our file checkpointing and recovery protocol is designed to consistently checkpoint and recover user files with respect to the volatile state of the distributed program. Based on the protocol, a file I/O interface has been implemented as part of our Libra library for supporting fault tolerance in distributed applications. File operations are done using this interface whereas the complexity of checkpointing and recovering user files is hidden from the application level-the checkpointing and recovery of user files are done automatically
Keywords
distributed algorithms; file organisation; memory protocols; parallel programming; software fault tolerance; software libraries; software reliability; system recovery; Libra library; distributed applications; fault tolerance; file I/O interface; file checkpointing; file operations; file recovery; protocol; reliable distributed program construction; user files; volatile program state; Application software; Buildings; Checkpointing; Computer science; Fault tolerance; Fault tolerant systems; Protocols; Runtime; Shadow mapping; Software libraries;
fLanguage
English
Publisher
ieee
Conference_Titel
High-Performance Computing, 1997. Proceedings. Fourth International Conference on
Conference_Location
Bangalore
Print_ISBN
0-8186-8067-9
Type
conf
DOI
10.1109/HIPC.1997.634518
Filename
634518
Link To Document