• DocumentCode
    3549458
  • Title

    Design time reliability analysis of distributed fault tolerance algorithms

  • Author

    Latronico, Elizabeth ; Koopman, Philip

  • Author_Institution
    Carnegie Mellon Univ., Pittsburgh, PA, USA
  • fYear
    2005
  • fDate
    28 June-1 July 2005
  • Firstpage
    486
  • Lastpage
    495
  • Abstract
    Designing a distributed fault tolerance algorithm requires careful analysis of both fault models and diagnosis strategies. A system will fail if there are too many active faults, especially active Byzantine faults. But, a system will also fail if overly aggressive convictions leave inadequate redundancy. For high reliability, an algorithm´s hybrid fault model and diagnosis strategy must be tuned to the types and rates of faults expected in the real world. We examine this balancing problem for two common types of distributed algorithms: clock synchronization and group membership. We show the importance of choosing a hybrid fault model appropriate for the physical faults expected by considering two clock synchronization algorithms. Three group membership service diagnosis strategies are used to demonstrate the benefit of discriminating between permanent and transient faults. In most cases, the probability of failure is dominated by one fault type. By identifying the dominant cause of failure, one can tailor an algorithm appropriately at design time, yielding significant reliability gain.
  • Keywords
    distributed algorithms; fault diagnosis; software reliability; synchronisation; Byzantine fault; clock synchronization; design time reliability analysis; distributed fault tolerance algorithm; failure probability; fault diagnosis; group membership service; hybrid fault model; Algorithm design and analysis; Automotive engineering; Clocks; Distributed algorithms; Fault diagnosis; Fault tolerance; Fault tolerant systems; Protocols; Redundancy; Synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Systems and Networks, 2005. DSN 2005. Proceedings. International Conference on
  • Print_ISBN
    0-7695-2282-3
  • Type

    conf

  • DOI
    10.1109/DSN.2005.38
  • Filename
    1467823