DocumentCode :
1684845
Title :
Model-based fault localization in large-scale computing systems
Author :
Maruyama, Naoya ; Matsuoka, Satoshi
Author_Institution :
Tokyo Inst. of Technol., Tokyo
fYear :
2008
Firstpage :
1
Lastpage :
12
Abstract :
We propose a new fault localization technique for software bugs in large-scale computing systems. Our technique always collects per-process function call traces of a target system, and derives a concise execution model that reflects its normal function calling behaviors using the traces. To find the cause of a failure, we compare the derived model with the traces collected when the system failed, and compute a suspect score that quantifies how likely a particular part of call traces explains the failure. The execution model consists of a call probability of each function in the system that we estimate using the normal traces. Functions with low probabilities in the model give high anomaly scores when called upon a failure. Frequently-called functions in the model also give high scores when not called. Finally, we report the function call sequences ranked with the suspect scores to the human analyst, narrowing further manual localization down to a small part of the overall system. We have applied our proposed method to fault localization of a known non-deterministic bug in a distributed parallel job manager. Experimental results on a three-site, 78-node distributed environment demonstrate that our method quickly locates an anomalous event that is highly correlated with the bug, indicating the effectiveness of our approach.
Keywords :
fault location; parallel processing; program debugging; software fault tolerance; call probability; distributed parallel job manager; function call sequences; large-scale computing systems; model-based fault localization; per-process function call traces; software bugs; Computer bugs; Distributed computing; Fault detection; Humans; Informatics; Large-scale systems; Scalability; Software architecture; Standards; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on
Conference_Location :
Miami, FL
ISSN :
1530-2075
Print_ISBN :
978-1-4244-1693-6
Electronic_ISBN :
1530-2075
Type :
conf
DOI :
10.1109/IPDPS.2008.4536310
Filename :
4536310
Link To Document :
بازگشت