• DocumentCode
    580143
  • Title

    Real-Time Anomaly Detection in Streams of Execution Traces

  • Author

    Zhang, Wenke ; Bastani, Favyen ; Yen, I-Ling ; Hulin, Kevin ; Bastani, Farokh ; Khan, Latifur

  • fYear
    2012
  • fDate
    25-27 Oct. 2012
  • Firstpage
    32
  • Lastpage
    39
  • Abstract
    For deployed systems, software fault detection can be challenging. Generally, faulty behaviors are detected based on execution logs, which may contain a large volume of execution traces, making analysis extremely difficult. This paper investigates and compares the effectiveness and efficiency of various data mining techniques for software fault detection based on execution logs, including clustering based, density based, and probabilistic automata based methods. However, some existing algorithms suffer from high complexity and do not scale well to large datasets. To address this problem, we present a suite of prefix tree based anomaly detection techniques. The prefix tree model serves as a compact loss less data representation of execution traces. Also, the prefix tree distance metric provides an effective heuristic to guide the search for execution traces having close proximity to each other. In the density based algorithm, the prefix tree distance is used to confine the K-nearest neighbor search to a small subset of the nodes, which greatly reduces the computing time without sacrificing accuracy. Experimental studies show a significant speedup in our prefix tree based and prefix tree distance guided approaches, from days to minutes in the best cases, in automated identification of software failures.
  • Keywords
    data mining; pattern classification; pattern clustering; probabilistic automata; program diagnostics; software fault tolerance; tree data structures; K-nearest neighbor search; compact loss less data representation; data mining techniques; density based methods; execution logs; execution traces; high complexity; prefix tree based anomaly detection techniques; prefix tree distance metric model; probabilistic automata based methods; real-time anomaly detection; software failures; software fault detection; Algorithm design and analysis; Automata; Clustering algorithms; Data models; Probabilistic logic; Software; Software algorithms; Anomaly detection; k-medoids clustering; local outlier factor; prefix tree; probabilistic automata;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High-Assurance Systems Engineering (HASE), 2012 IEEE 14th International Symposium on
  • Conference_Location
    Omaha, NE
  • ISSN
    1530-2059
  • Print_ISBN
    978-1-4673-4742-6
  • Type

    conf

  • DOI
    10.1109/HASE.2012.13
  • Filename
    6375634