Title :
Detection of Arbitrarily Oriented Synchronized Clusters in High-Dimensional Data
Author :
Shao, Junming ; Plant, Claudia ; Yang, Qinli ; Böhm, Christian
Author_Institution :
Inst. for Comput. Sci., Univ. of Munich, Munich, Germany
Abstract :
How to address the challenges of the "curse of dimensionality" in clustering? Clustering is a powerful data mining technique for structuring and organizing vast amounts of data. However, the high-dimensional data space is usually very sparse and meaningful clusters can only be found in lower dimensional subspaces. In many applications the subspaces hosting the clusters provide valuable information for interpreting the major patterns in the data. Detection of subspace clusters is challenging since usually many of the attributes are noisy, some attributes may exhibit correlations among each other and only few of the attributes truly contribute to the cluster structure. In this paper, we propose ORSC (Arbitrarily ORiented Synchronized Clusters), a novel effective and efficient method to subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the interaction model for synchronization, our approach ORSC (1) naturally detects correlation clusters in arbitrarily oriented subspaces, including (2) arbitrarily shaped non-linear correlation clusters. Our approach is (3) robust against noise points and outliers. In contrast to previous methods, ORSC is (4) easy to parameterize, since there is no need to specify the subspace dimensionality and all interesting subspace clusters can be detected. Finally, (5) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets.
Keywords :
data mining; pattern clustering; ORSC; arbitrarily oriented subspaces; arbitrarily oriented synchronized cluster detection; complex processes; correlation clusters; data mining technique; high dimensional data; noise points; outliers; subspace dimensionality; Clustering algorithms; Correlation; Covariance matrix; Eigenvalues and eigenfunctions; Oscillators; Principal component analysis; Synchronization; high-dimensional data; interaction model; subspace clustering; synchronization;
Conference_Titel :
Data Mining (ICDM), 2011 IEEE 11th International Conference on
Conference_Location :
Vancouver,BC
Print_ISBN :
978-1-4577-2075-8
DOI :
10.1109/ICDM.2011.50