DocumentCode :
2360423
Title :
Identifying faults in large-scale distributed systems by filtering noisy error logs
Author :
Rao, Xiang ; Wang, Huaimin ; Shi, Dianxi ; Chen, Zhenbang ; Cai, Hua ; Zhou, Qi ; Sun, Tingtao
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2011
fDate :
27-30 June 2011
Firstpage :
140
Lastpage :
145
Abstract :
Extracting fault features with the error logs of fault injection tests has been widely studied in the area of large scale distributed systems for decades. However, the process of extracting features is severely affected by a large amount of noisy logs. While the existing work tries to solve the problem by compressing logs in temporal and spatial views or removing the semantic redundancy between logs, they fail to consider the co-existence of other noisy faults that generate error logs instead of injected faults, for example, random hardware faults, unexpected bugs of softwares, system configuration faults or the error rank of a log severity. During a fault feature extraction process, those noisy faults generate error logs that are not related to a target fault, and will strongly mislead the resulted fault features. We call an error log that is not related to a target fault a noisy error log. To filter out noisy error logs, we present a similarity-based error log filtering method SBF, which consists of three integrated steps: (1) model error logs into time series and use haar wavelet transform to get the approximate time series; (2) divide the approximate time series into sub time series by valleys; (3) identify noisy error logs by comparing the similarity between the sub time series of target error logs and the template of noisy error logs. We apply our log filtering method in an enterprise cloud system and show its effectiveness. Compared with the existing work, we successfully filter out noisy error logs and increase the precision and the recall rate of fault feature extraction.
Keywords :
cloud computing; data compression; distributed processing; fault tolerant computing; feature extraction; system monitoring; time series; wavelet transforms; data compression; enterprise cloud system; fault feature extraction; fault identification; fault injection tests; large scale distributed systems; noisy error log filtering; semantic redundancy; time series; wavelet transform; Approximation methods; Complexity theory; Computer crashes; Feature extraction; Noise measurement; Time series analysis; Wavelet transforms; error log; event filtering; fault injection; large scale distributed system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks Workshops (DSN-W), 2011 IEEE/IFIP 41st International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4577-0374-4
Electronic_ISBN :
978-1-4577-0373-7
Type :
conf
DOI :
10.1109/DSNW.2011.5958800
Filename :
5958800
Link To Document :
بازگشت