• DocumentCode
    244989
  • Title

    RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection

  • Author

    Ke Wu ; Kun Zhang ; Wei Fan ; Edwards, Andrea ; Yu, Philip S.

  • fYear
    2014
  • fDate
    14-17 Dec. 2014
  • Firstpage
    600
  • Lastpage
    609
  • Abstract
    Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS Forestto systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.
  • Keywords
    data mining; learning (artificial intelligence); security of data; statistical analysis; trees (mathematics); RS-forest; RS-trees; anomaly detection; asymptotic property; bias-variance decomposition; dual node profile; one-class semisupervised algorithm; piecewise constant density; randomized space trees; rapid density estimator; statistical attribute range estimation; streaming data; Benchmark testing; Data models; Detectors; Estimation; Predictive models; Upper bound; Vegetation; Anomaly detection; data streams; density estimation; ensembles; streaming data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2014 IEEE International Conference on
  • Conference_Location
    Shenzhen
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4799-4303-6
  • Type

    conf

  • DOI
    10.1109/ICDM.2014.45
  • Filename
    7023377