• DocumentCode
    866790
  • Title

    Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences

  • Author

    Silva, Jorge ; Willett, Rebecca

  • Author_Institution
    Duke Univ., Durham, NC
  • Volume
    31
  • Issue
    3
  • fYear
    2009
  • fDate
    3/1/2009 12:00:00 AM
  • Firstpage
    563
  • Lastpage
    569
  • Abstract
    This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational expectation-maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the false discovery rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
  • Keywords
    computational complexity; data analysis; expectation-maximisation algorithm; graph theory; spatial data structures; unsupervised learning; variational techniques; Enron email database; computational complexity; data hypergraph representation; dimensionality reduction; false discovery rate; feature selection; high-dimensional multivariate co-occurrence data analysis; hypergraph-based anomaly detection; unsupervised learning; variational expectation-maximization algorithm; Anomaly detection; Co-occurrence data; False Discovery Rate; Unsupervised learning; Variational methods; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Pattern Analysis and Machine Intelligence, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0162-8828
  • Type

    jour

  • DOI
    10.1109/TPAMI.2008.232
  • Filename
    4626961