• DocumentCode
    3125129
  • Title

    Detection of Arbitrarily Oriented Synchronized Clusters in High-Dimensional Data

  • Author

    Shao, Junming ; Plant, Claudia ; Yang, Qinli ; Böhm, Christian

  • Author_Institution
    Inst. for Comput. Sci., Univ. of Munich, Munich, Germany
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    607
  • Lastpage
    616
  • Abstract
    How to address the challenges of the "curse of dimensionality" in clustering? Clustering is a powerful data mining technique for structuring and organizing vast amounts of data. However, the high-dimensional data space is usually very sparse and meaningful clusters can only be found in lower dimensional subspaces. In many applications the subspaces hosting the clusters provide valuable information for interpreting the major patterns in the data. Detection of subspace clusters is challenging since usually many of the attributes are noisy, some attributes may exhibit correlations among each other and only few of the attributes truly contribute to the cluster structure. In this paper, we propose ORSC (Arbitrarily ORiented Synchronized Clusters), a novel effective and efficient method to subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the interaction model for synchronization, our approach ORSC (1) naturally detects correlation clusters in arbitrarily oriented subspaces, including (2) arbitrarily shaped non-linear correlation clusters. Our approach is (3) robust against noise points and outliers. In contrast to previous methods, ORSC is (4) easy to parameterize, since there is no need to specify the subspace dimensionality and all interesting subspace clusters can be detected. Finally, (5) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets.
  • Keywords
    data mining; pattern clustering; ORSC; arbitrarily oriented subspaces; arbitrarily oriented synchronized cluster detection; complex processes; correlation clusters; data mining technique; high dimensional data; noise points; outliers; subspace dimensionality; Clustering algorithms; Correlation; Covariance matrix; Eigenvalues and eigenfunctions; Oscillators; Principal component analysis; Synchronization; high-dimensional data; interaction model; subspace clustering; synchronization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.50
  • Filename
    6137265