DocumentCode
3165153
Title
Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection
Author
Ando, Shin
Author_Institution
Yokohama Nat. Univ., Yokohama
fYear
2007
fDate
28-31 Oct. 2007
Firstpage
13
Lastpage
22
Abstract
Identifying atypical objects is one of the traditional topics in machine learning. Recently, novel approaches, e.g., Minority Detection and One-class clustering, have explored further to identify clusters of atypical objects which strongly contrast from the rest of the data in terms of their distribution or density. This paper analyzes such tasks from an information theoretic perspective. Based on Information Bottleneck formalization, these tasks interpret to increasing the averaged atypicalness of the clusters while reducing the complexity of the clustering. This formalization yields a unifying view of the new approaches as well as the classic outlier detection. We also present a scalable minimization algorithm which exploits the localized form of the cost function over individual clusters. The proposed algorithm is evaluated using simulated datasets and a text classification benchmark, in comparison with an existing method.
Keywords
learning (artificial intelligence); object detection; pattern classification; information bottleneck formalization; information theoretic analysis; machine learning; minority detection; needles clustering; one-class clustering; scalable minimization algorithm; simulated datasets; text classification; Clustering algorithms; Cost function; Data mining; Information analysis; Machine learning; Machine learning algorithms; Needles; Object detection; Rate distortion theory; Unsupervised learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on
Conference_Location
Omaha, NE
ISSN
1550-4786
Print_ISBN
978-0-7695-3018-5
Type
conf
DOI
10.1109/ICDM.2007.53
Filename
4470225
Link To Document