DocumentCode
3125129
Title
Detection of Arbitrarily Oriented Synchronized Clusters in High-Dimensional Data
Author
Shao, Junming ; Plant, Claudia ; Yang, Qinli ; Böhm, Christian
Author_Institution
Inst. for Comput. Sci., Univ. of Munich, Munich, Germany
fYear
2011
fDate
11-14 Dec. 2011
Firstpage
607
Lastpage
616
Abstract
How to address the challenges of the "curse of dimensionality" in clustering? Clustering is a powerful data mining technique for structuring and organizing vast amounts of data. However, the high-dimensional data space is usually very sparse and meaningful clusters can only be found in lower dimensional subspaces. In many applications the subspaces hosting the clusters provide valuable information for interpreting the major patterns in the data. Detection of subspace clusters is challenging since usually many of the attributes are noisy, some attributes may exhibit correlations among each other and only few of the attributes truly contribute to the cluster structure. In this paper, we propose ORSC (Arbitrarily ORiented Synchronized Clusters), a novel effective and efficient method to subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the interaction model for synchronization, our approach ORSC (1) naturally detects correlation clusters in arbitrarily oriented subspaces, including (2) arbitrarily shaped non-linear correlation clusters. Our approach is (3) robust against noise points and outliers. In contrast to previous methods, ORSC is (4) easy to parameterize, since there is no need to specify the subspace dimensionality and all interesting subspace clusters can be detected. Finally, (5) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets.
Keywords
data mining; pattern clustering; ORSC; arbitrarily oriented subspaces; arbitrarily oriented synchronized cluster detection; complex processes; correlation clusters; data mining technique; high dimensional data; noise points; outliers; subspace dimensionality; Clustering algorithms; Correlation; Covariance matrix; Eigenvalues and eigenfunctions; Oscillators; Principal component analysis; Synchronization; high-dimensional data; interaction model; subspace clustering; synchronization;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location
Vancouver,BC
ISSN
1550-4786
Print_ISBN
978-1-4577-2075-8
Type
conf
DOI
10.1109/ICDM.2011.50
Filename
6137265
Link To Document