DocumentCode :
2446408
Title :
Diagnosing permanent faults in distributed and parallel computing systems using artificial neural networks
Author :
Elhadef, Mourad
Author_Institution :
Coll. of Eng. & Comput. Sci., Abu Dhabi Univ., Abu Dhabi, United Arab Emirates
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
8
Abstract :
This paper deals with the problem of identifying faulty nodes (or units) in diagnosable distributed and parallel systems under the PMC model. In this model, each unit is tested by a subset of the other units, and it is assumed that, at most, a bounded subset of these units is permanently faulty. When performing testing, faulty units can incorrectly claim that fault-free units are faulty or that faulty units are fault-free. Since the introduction of the PMC model, significant progress has been made in both theory and practice associated with the original model and its offshoots. Nevertheless, this problem of efficiently identifying the set of faulty units of a diagnosable system remained an outstanding research issue. In this paper, we describe a new neural-network-based diagnosis algorithm, which exploits the off-line learning phase of artificial neural network to speed up the diagnosis algorithm. The novel approach has been implemented and evaluated using randomly generated diagnosable systems. The simulation results showed that the new neural-network-based fault identification approach constitutes an addition to existing diagnosis algorithms. Extreme faulty situations, where the number of faults is around the bound t, and large diagnosable systems have been also experimented to show the efficiency of the new neural-network-based diagnosis algorithm.
Keywords :
fault diagnosis; neural nets; parallel processing; program testing; software fault tolerance; artificial neural networks; distributed computing systems; fault identification; offline learning phase; parallel computing systems; permanent faults diagnosis; Artificial neural networks; Computer science; Educational institutions; Fault diagnosis; Fault tolerance; Military computing; Parallel processing; Performance evaluation; Signal processing algorithms; Testing; Distributed and parallel systems; Fault tolerance; Invalidation model; Neural networks; System-level diagnosis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
Print_ISBN :
978-1-4244-6533-0
Type :
conf
DOI :
10.1109/IPDPSW.2010.5470695
Filename :
5470695
Link To Document :
بازگشت