DocumentCode
1890016
Title
Coping with training contamination in unsupervised distributional anomaly detection
Author
Borges, Nash ; Meyer, Gerard G L
Author_Institution
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD
fYear
2009
fDate
18-20 March 2009
Firstpage
264
Lastpage
269
Abstract
In previous work, we presented several distributional approaches to anomaly detection for a speech activity detector by training a model on purely nominal data and estimating the divergence between it and other input. Here, we reformulate the problem in an unsupervised framework and allow for anomalous contamination of the training data. After noting the instability of Gaussian mixture models (GMMs) in this context, we focus on non-parametric methods using regularly binned histograms. While the performance of the log likelihood baseline suffered as the amount of contamination was increased, many of the distributional approaches were not affected. We found that the L1 distance, chi2 statistic, and information theory divergences consistently outperformed the other methods for a variety of contamination levels and test segment lengths.
Keywords
Gaussian processes; learning (artificial intelligence); signal classification; speech recognition; statistical analysis; Gaussian mixture model; anomalous training data contamination; binned histogram; log likelihood baseline; speech activity detector; training classifier; unsupervised distributional anomaly detection; Contamination; Context modeling; Detectors; Histograms; Information theory; Speech; Statistical analysis; Statistical distributions; Testing; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Sciences and Systems, 2009. CISS 2009. 43rd Annual Conference on
Conference_Location
Baltimore, MD
Print_ISBN
978-1-4244-2733-8
Electronic_ISBN
978-1-4244-2734-5
Type
conf
DOI
10.1109/CISS.2009.5054728
Filename
5054728
Link To Document