• DocumentCode
    2436435
  • Title

    Performance evaluation of fault tolerance for parallel applications in networked environments

  • Author

    Sens, Pierre ; Folliot, Bertil

  • Author_Institution
    Paris VI Univ., France
  • fYear
    1997
  • fDate
    11-15 Aug 1997
  • Firstpage
    334
  • Lastpage
    341
  • Abstract
    This paper presents the performance evaluation of a software fault manager for distributed applications. Dubbed STAR, it uses the natural redundancy existing in networks of workstations to offer a high level of fault tolerance. Fault management is transparent to the supported parallel applications. STAR is application independent, highly configurable and easily portable to UNIX-like operating systems. The current implementation is based on independent checkpointing and message logging. Measurements show the efficiency and the limits of this implementation. The challenge is to show that a software approach to fault tolerance can efficiently be implemented in a standard networked environment
  • Keywords
    distributed processing; fault tolerant computing; performance evaluation; system recovery; STAR; fault management; fault tolerance; independent checkpointing; message logging; networked environments; parallel applications; performance evaluation; software fault manager; Application software; Buffer storage; Checkpointing; Environmental management; Fault tolerance; Hardware; Intelligent networks; Operating systems; Redundancy; Software performance;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing, 1997., Proceedings of the 1997 International Conference on
  • Conference_Location
    Bloomington, IL
  • ISSN
    0190-3918
  • Print_ISBN
    0-8186-8108-X
  • Type

    conf

  • DOI
    10.1109/ICPP.1997.622663
  • Filename
    622663