• DocumentCode
    1114889
  • Title

    A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution

  • Author

    Ramasamy, HariGovind V. ; Agbaria, Adnan ; Sanders, William H.

  • Author_Institution
    IBM Zurich Res. Lab., Ruschlikon
  • Volume
    4
  • Issue
    1
  • fYear
    2007
  • Firstpage
    1
  • Lastpage
    17
  • Abstract
    We propose a resource-efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well-suited for services in which request processing is resource-intensive. Previous efforts took a failure masking all-active approach of using all execution replicas to execute all requests; at least 2t + 1 execution replicas are needed to mask t Byzantine-faulty ones. We describe an asynchronous protocol that provides resource-efficient execution by combining failure masking with imperfect failure detection and checkpointing. Our protocol is parsimonious since it uses only t + 1 execution replicas, called the primary committee or PC, to execute the requests under normal conditions characterized by a stable network and no misbehavior by PC replicas; thus, a trustworthy reply can be obtained with the same latency, but with only about half of the overall resource use of the all-active approach. However, a request that exposes faults among the PC replicas causes the protocol to switch to a recovery mode, in which all 2t + 1 replicas execute the request and send their replies; then, after selecting a new PC, the protocol switches back to parsimonious execution. Such a request incurs a higher latency using our approach than the all-active approach, mainly because of fault detection latency. Practical observations point to the fact that failures and instability are the exception rather than the norm. That motivated our decision to optimize resource efficiency for the common case, even if it means paying a slightly higher performance cost during periods of instability
  • Keywords
    checkpointing; fault tolerant computing; protocols; resource allocation; supervisory programs; Byzantine-fault-tolerant replication; asynchronous protocol; checkpointing; execution replica; failure detection; failure masking; fault detection latency; parsimonious execution; primary committee; request processing; resource efficiency optimization; resource-efficient request execution; trustworthy execution; Checkpointing; Computer errors; Cost function; Delay; Fault detection; Fault tolerance; Fault tolerant systems; Information systems; Protocols; Switches; Byzantine faults.; Distributed systems; fault tolerance;
  • fLanguage
    English
  • Journal_Title
    Dependable and Secure Computing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5971
  • Type

    jour

  • DOI
    10.1109/TDSC.2007.2
  • Filename
    4099188