DocumentCode
3105358
Title
P3C: A Robust Projected Clustering Algorithm
Author
Moise, Gabriela ; Sander, Jörg ; Ester, Martin
Author_Institution
Dept. of Comput. Sci., Alberta Univ., Edmonton, AB
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
414
Lastpage
425
Abstract
Projected clustering has emerged as a possible solution to the challenges associated with clustering in high dimensional data. A projected cluster is a subset of points together with a subset of attributes, such that the cluster points project onto a small range of values in each of these attributes, and are uniformly distributed in the remaining attributes. Existing algorithms for projected clustering rely on parameters whose appropriate values are difficult to set by the user, or are unable to identify projected clusters with few relevant attributes. In this paper, we present a robust algorithm for projected clustering that can effectively discover projected clusters in the data while minimizing the number of parameters required as input. In contrast to all previous approaches, our algorithm can discover, under very general conditions, the true number of projected clusters. We show through an extensive experimental evaluation that our algorithm: (1) significantly outperforms existing algorithms for projected clustering in terms of accuracy; (2) is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces; (3) is effective in detecting clusters with varying orientation in their relevant subspaces; (4) is scalable with respect to large data sets and high number of dimensions.
Keywords
pattern clustering; very large databases; extensive experimental evaluation; high dimensional data; large data sets; robust algorithm; robust projected clustering algorithm; Clustering algorithms; Data models; Databases; Nearest neighbor searches; Principal component analysis; Robustness;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining, 2006. ICDM '06. Sixth International Conference on
Conference_Location
Hong Kong
ISSN
1550-4786
Print_ISBN
0-7695-2701-7
Type
conf
DOI
10.1109/ICDM.2006.123
Filename
4053068
Link To Document