DocumentCode
866790
Title
Hypergraph-Based Anomaly Detection of High-Dimensional Co-Occurrences
Author
Silva, Jorge ; Willett, Rebecca
Author_Institution
Duke Univ., Durham, NC
Volume
31
Issue
3
fYear
2009
fDate
3/1/2009 12:00:00 AM
Firstpage
563
Lastpage
569
Abstract
This paper addresses the problem of detecting anomalous multivariate co-occurrences using a limited number of unlabeled training observations. A novel method based on using a hypergraph representation of the data is proposed to deal with this very high-dimensional problem. Hypergraphs constitute an important extension of graphs which allow edges to connect more than two vertices simultaneously. A variational expectation-maximization algorithm for detecting anomalies directly on the hypergraph domain without any feature selection or dimensionality reduction is presented. The resulting estimate can be used to calculate a measure of anomalousness based on the false discovery rate. The algorithm has O(np) computational complexity, where n is the number of training observations and p is the number of potential participants in each co-occurrence event. This efficiency makes the method ideally suited for very high-dimensional settings, and requires no tuning, bandwidth or regularization parameters. The proposed approach is validated on both high-dimensional synthetic data and the Enron email database, where p > 75,000, and it is shown that it can outperform other state-of-the-art methods.
Keywords
computational complexity; data analysis; expectation-maximisation algorithm; graph theory; spatial data structures; unsupervised learning; variational techniques; Enron email database; computational complexity; data hypergraph representation; dimensionality reduction; false discovery rate; feature selection; high-dimensional multivariate co-occurrence data analysis; hypergraph-based anomaly detection; unsupervised learning; variational expectation-maximization algorithm; Anomaly detection; Co-occurrence data; False Discovery Rate; Unsupervised learning; Variational methods; Algorithms; Artificial Intelligence; Computer Simulation; Models, Theoretical; Pattern Recognition, Automated;
fLanguage
English
Journal_Title
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher
ieee
ISSN
0162-8828
Type
jour
DOI
10.1109/TPAMI.2008.232
Filename
4626961
Link To Document