Title :
A Parsimonious Approach for Obtaining Resource-Efficient and Trustworthy Execution
Author :
Ramasamy, HariGovind V. ; Agbaria, Adnan ; Sanders, William H.
Author_Institution :
IBM Zurich Res. Lab., Ruschlikon
Abstract :
We propose a resource-efficient way to execute requests in Byzantine-fault-tolerant replication that is particularly well-suited for services in which request processing is resource-intensive. Previous efforts took a failure masking all-active approach of using all execution replicas to execute all requests; at least 2t + 1 execution replicas are needed to mask t Byzantine-faulty ones. We describe an asynchronous protocol that provides resource-efficient execution by combining failure masking with imperfect failure detection and checkpointing. Our protocol is parsimonious since it uses only t + 1 execution replicas, called the primary committee or PC, to execute the requests under normal conditions characterized by a stable network and no misbehavior by PC replicas; thus, a trustworthy reply can be obtained with the same latency, but with only about half of the overall resource use of the all-active approach. However, a request that exposes faults among the PC replicas causes the protocol to switch to a recovery mode, in which all 2t + 1 replicas execute the request and send their replies; then, after selecting a new PC, the protocol switches back to parsimonious execution. Such a request incurs a higher latency using our approach than the all-active approach, mainly because of fault detection latency. Practical observations point to the fact that failures and instability are the exception rather than the norm. That motivated our decision to optimize resource efficiency for the common case, even if it means paying a slightly higher performance cost during periods of instability
Keywords :
checkpointing; fault tolerant computing; protocols; resource allocation; supervisory programs; Byzantine-fault-tolerant replication; asynchronous protocol; checkpointing; execution replica; failure detection; failure masking; fault detection latency; parsimonious execution; primary committee; request processing; resource efficiency optimization; resource-efficient request execution; trustworthy execution; Checkpointing; Computer errors; Cost function; Delay; Fault detection; Fault tolerance; Fault tolerant systems; Information systems; Protocols; Switches; Byzantine faults.; Distributed systems; fault tolerance;
Journal_Title :
Dependable and Secure Computing, IEEE Transactions on
DOI :
10.1109/TDSC.2007.2