• DocumentCode
    3105358
  • Title

    P3C: A Robust Projected Clustering Algorithm

  • Author

    Moise, Gabriela ; Sander, Jörg ; Ester, Martin

  • Author_Institution
    Dept. of Comput. Sci., Alberta Univ., Edmonton, AB
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    414
  • Lastpage
    425
  • Abstract
    Projected clustering has emerged as a possible solution to the challenges associated with clustering in high dimensional data. A projected cluster is a subset of points together with a subset of attributes, such that the cluster points project onto a small range of values in each of these attributes, and are uniformly distributed in the remaining attributes. Existing algorithms for projected clustering rely on parameters whose appropriate values are difficult to set by the user, or are unable to identify projected clusters with few relevant attributes. In this paper, we present a robust algorithm for projected clustering that can effectively discover projected clusters in the data while minimizing the number of parameters required as input. In contrast to all previous approaches, our algorithm can discover, under very general conditions, the true number of projected clusters. We show through an extensive experimental evaluation that our algorithm: (1) significantly outperforms existing algorithms for projected clustering in terms of accuracy; (2) is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces; (3) is effective in detecting clusters with varying orientation in their relevant subspaces; (4) is scalable with respect to large data sets and high number of dimensions.
  • Keywords
    pattern clustering; very large databases; extensive experimental evaluation; high dimensional data; large data sets; robust algorithm; robust projected clustering algorithm; Clustering algorithms; Data models; Databases; Nearest neighbor searches; Principal component analysis; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2006. ICDM '06. Sixth International Conference on
  • Conference_Location
    Hong Kong
  • ISSN
    1550-4786
  • Print_ISBN
    0-7695-2701-7
  • Type

    conf

  • DOI
    10.1109/ICDM.2006.123
  • Filename
    4053068