DocumentCode :
1890016
Title :
Coping with training contamination in unsupervised distributional anomaly detection
Author :
Borges, Nash ; Meyer, Gerard G L
Author_Institution :
Dept. of Electr. & Comput. Eng., Johns Hopkins Univ., Baltimore, MD
fYear :
2009
fDate :
18-20 March 2009
Firstpage :
264
Lastpage :
269
Abstract :
In previous work, we presented several distributional approaches to anomaly detection for a speech activity detector by training a model on purely nominal data and estimating the divergence between it and other input. Here, we reformulate the problem in an unsupervised framework and allow for anomalous contamination of the training data. After noting the instability of Gaussian mixture models (GMMs) in this context, we focus on non-parametric methods using regularly binned histograms. While the performance of the log likelihood baseline suffered as the amount of contamination was increased, many of the distributional approaches were not affected. We found that the L1 distance, chi2 statistic, and information theory divergences consistently outperformed the other methods for a variety of contamination levels and test segment lengths.
Keywords :
Gaussian processes; learning (artificial intelligence); signal classification; speech recognition; statistical analysis; Gaussian mixture model; anomalous training data contamination; binned histogram; log likelihood baseline; speech activity detector; training classifier; unsupervised distributional anomaly detection; Contamination; Context modeling; Detectors; Histograms; Information theory; Speech; Statistical analysis; Statistical distributions; Testing; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Sciences and Systems, 2009. CISS 2009. 43rd Annual Conference on
Conference_Location :
Baltimore, MD
Print_ISBN :
978-1-4244-2733-8
Electronic_ISBN :
978-1-4244-2734-5
Type :
conf
DOI :
10.1109/CISS.2009.5054728
Filename :
5054728
Link To Document :
بازگشت