Real-Time Anomaly Detection in Streams of Execution Traces

Author

Zhang, Wenke ; Bastani, Favyen ; Yen, I-Ling ; Hulin, Kevin ; Bastani, Farokh ; Khan, Latifur

fYear

2012

fDate

25-27 Oct. 2012

Firstpage

32

Lastpage

39

Abstract

For deployed systems, software fault detection can be challenging. Generally, faulty behaviors are detected based on execution logs, which may contain a large volume of execution traces, making analysis extremely difficult. This paper investigates and compares the effectiveness and efficiency of various data mining techniques for software fault detection based on execution logs, including clustering based, density based, and probabilistic automata based methods. However, some existing algorithms suffer from high complexity and do not scale well to large datasets. To address this problem, we present a suite of prefix tree based anomaly detection techniques. The prefix tree model serves as a compact loss less data representation of execution traces. Also, the prefix tree distance metric provides an effective heuristic to guide the search for execution traces having close proximity to each other. In the density based algorithm, the prefix tree distance is used to confine the K-nearest neighbor search to a small subset of the nodes, which greatly reduces the computing time without sacrificing accuracy. Experimental studies show a significant speedup in our prefix tree based and prefix tree distance guided approaches, from days to minutes in the best cases, in automated identification of software failures.

Keywords

data mining; pattern classification; pattern clustering; probabilistic automata; program diagnostics; software fault tolerance; tree data structures; K-nearest neighbor search; compact loss less data representation; data mining techniques; density based methods; execution logs; execution traces; high complexity; prefix tree based anomaly detection techniques; prefix tree distance metric model; probabilistic automata based methods; real-time anomaly detection; software failures; software fault detection; Algorithm design and analysis; Automata; Clustering algorithms; Data models; Probabilistic logic; Software; Software algorithms; Anomaly detection; k-medoids clustering; local outlier factor; prefix tree; probabilistic automata;

fLanguage

English

Publisher

ieee

Conference_Titel

High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on

Conference_Location

Omaha, NE

ISSN

1530-2059

Print_ISBN

978-1-4673-4742-6

Type

conf

DOI

10.1109/HASE.2012.13

Filename

6375634