• DocumentCode
    822249
  • Title

    Mining Projected Clusters in High-Dimensional Spaces

  • Author

    Bouguessa, Mohamed ; Wang, Shengrui

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sherbrooke, Sherbrooke, QC
  • Volume
    21
  • Issue
    4
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    507
  • Lastpage
    522
  • Abstract
    Clustering high-dimensional data has been a major challenge due to the inherent sparsity of the points. Most existing clustering algorithms become substantially inefficient if the required similarity measure is computed between data points in the full-dimensional space. To address this problem, a number of projected clustering algorithms have been proposed. However, most of them encounter difficulties when clusters hide in subspaces with very low dimensionality. These challenges motivate our effort to propose a robust partitional distance-based projected clustering algorithm. The algorithm consists of three phases. The first phase performs attribute relevance analysis by detecting dense and sparse regions and their location in each attribute. Starting from the results of the first phase, the goal of the second phase is to eliminate outliers, while the third phase aims to discover clusters in different subspaces. The clustering process is based on the k-means algorithm, with the computation of distance restricted to subsets of attributes where object values are dense. Our algorithm is capable of detecting projected clusters of low dimensionality embedded in a high-dimensional space and avoids the computation of the distance in the full-dimensional space. The suitability of our proposal has been demonstrated through an empirical study using synthetic and real datasets.
  • Keywords
    data mining; pattern clustering; attribute relevance analysis; data mining; dense region detection; high-dimensional data clustering; k-means algorithm; projected clustering mining; robust partitional distance; sparse region detection; Clustering; Mining methods and algorithms; data mining;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2008.162
  • Filename
    4585382