DocumentCode
2436435
Title
Performance evaluation of fault tolerance for parallel applications in networked environments
Author
Sens, Pierre ; Folliot, Bertil
Author_Institution
Paris VI Univ., France
fYear
1997
fDate
11-15 Aug 1997
Firstpage
334
Lastpage
341
Abstract
This paper presents the performance evaluation of a software fault manager for distributed applications. Dubbed STAR, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. STAR is application independent, highly configurable and easily portable to UNIX-like operating systems. The current implementation is based on independent checkpointing and message logging. Measurements show the efficiency and the limits of this implementation. The challenge is to show that a software approach to fault tolerance can efficiently be implemented in a standard networked environment
Keywords
distributed processing; fault tolerant computing; performance evaluation; system recovery; STAR; fault management; fault tolerance; independent checkpointing; message logging; networked environments; parallel applications; performance evaluation; software fault manager; Application software; Buffer storage; Checkpointing; Environmental management; Fault tolerance; Hardware; Intelligent networks; Operating systems; Redundancy; Software performance;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel Processing, 1997., Proceedings of the 1997 International Conference on
Conference_Location
Bloomington, IL
ISSN
0190-3918
Print_ISBN
0-8186-8108-X
Type
conf
DOI
10.1109/ICPP.1997.622663
Filename
622663
Link To Document