DocumentCode :
395586
Title :
Software fault tolerance of distributed programs using computation slicing
Author :
Mittal, Neeraj ; Garg, Vijay K.
Author_Institution :
Dept. of Comput. Sci., Texas Univ. at Dallas, Richardson, TX, USA
fYear :
2003
fDate :
19-22 May 2003
Firstpage :
105
Lastpage :
113
Abstract :
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, especially those employed in safety-critical environments, should be able to operate properly even in the presence of software faults. Monitoring the execution of a distributed system, and, on detecting a fault, initiating the appropriate corrective action is an important way to tolerate such faults. This gives rise to the predicate detection problem which involves finding a consistent cut of a distributed computation, if it exists, that satisfies the given global predicate. Detecting a predicate in a computation is, however, an NP-complete problem. To ameliorate the associated combinatorial explosion problem, we introduce the notion of computation slice in our earlier papers [5, 10]. Intuitively, slice is a concise representation of those consistent cuts that satisfy a certain condition. To detect a predicate, rather than searching the state-space of the computation, it is much more efficient to search the state-space of the slice. In this paper we provide efficient algorithms to compute the slice for several classes of predicates. Our experimental results demonstrate that slicing can lead to an exponential improvement over existing techniques in terms of lime and space.
Keywords :
computational complexity; distributed algorithms; program debugging; program slicing; software fault tolerance; computation slicing; distributed program; partial-order method; predicate detection; search-space pruning; software debugging; software fault tolerance; software testing; software-fault tolerance; Distributed computing; Explosions; Fault detection; Fault tolerance; Monitoring; NP-complete problem; Software debugging; Software safety; Software testing; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Distributed Computing Systems, 2003. Proceedings. 23rd International Conference on
ISSN :
1063-6927
Print_ISBN :
0-7695-1920-2
Type :
conf
DOI :
10.1109/ICDCS.2003.1203457
Filename :
1203457
Link To Document :
بازگشت