• DocumentCode
    609950
  • Title

    Automatically Tolerating Arbitrary Faults in Non-malicious Settings

  • Author

    Behrens, Diogo ; Weigert, S. ; Fetzer, Christof

  • Author_Institution
    Syst. Eng. Group, Tech. Univ. Dresden, Dresden, Germany
  • fYear
    2013
  • fDate
    1-5 April 2013
  • Firstpage
    114
  • Lastpage
    123
  • Abstract
    Arbitrary faults such as bit flips have been often observed in commodity-hardware data centers and have disrupted large services. Benign faults, such as crashes and message omissions, are nevertheless the standard assumption in practical fault-tolerant distributed systems. Algorithms tolerant to arbitrary faults are harder to understand and more expensive to deploy (requiring more machines). In this work, we introduce a non-malicious arbitrary fault model including transient and permanent arbitrary faults, such as bit flips and hardware-design errors, but no malicious faults, typically caused by security breaches. We then present a compiler-based framework that allows benign fault-tolerant algorithms to automatically tolerate arbitrary faults in non-malicious settings. Finally, we experimentally evaluate two fundamental algorithms: Paxos and leader election. At expense of CPU cycles, transformed algorithms use the same number of processes as their benign fault-tolerant counterparts, and have virtually no network overhead, while reducing the probability of failing arbitrarily by two orders of magnitude.
  • Keywords
    computer centres; fault tolerant computing; Paxos algorithm; arbitrary fault tolerance; benign fault; bit flip; commodity-hardware data center; crash fault; fault tolerant distributed system; hardware design error; leader election algorithm; message omission fault; nonmalicious arbitrary fault model; permanent arbitrary fault; transient arbitrary fault; Computer crashes; Distributed algorithms; Encoding; Fault tolerance; Fault tolerant systems; Hardware; Transforms; Byzantine faults; algorithm transformation; arbitrary faults; fault tolerance; hardware errors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Dependable Computing (LADC), 2013 Sixth Latin-American Symposium on
  • Conference_Location
    Rio de Janeiro
  • Print_ISBN
    978-1-4673-5746-3
  • Type

    conf

  • DOI
    10.1109/LADC.2013.26
  • Filename
    6542613