• DocumentCode
    1220864
  • Title

    Projective clustering by histograms

  • Author

    Ng, Eric Ka Ka ; Fu, Ada Wai-Chee ; Wong, Raymond Chi-Wing

  • Author_Institution
    Dept. of Comput. Sci., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    17
  • Issue
    3
  • fYear
    2005
  • fDate
    3/1/2005 12:00:00 AM
  • Firstpage
    369
  • Lastpage
    383
  • Abstract
    Recent research suggests that clustering for high-dimensional data should involve searching for "hidden" subspaces with lower dimensionalities, in which patterns can be observed when data objects are projected onto the subspaces. Discovering such interattribute correlations and location of the corresponding clusters is known as the projective clustering problem. We propose an efficient projective clustering technique by histogram construction (EPCH). The histograms help to generate "signatures", where a signature corresponds to some region in some subspace, and signatures with a large number of data objects are identified as the regions for subspace clusters. Hence, projected clusters and their corresponding subspaces can be uncovered. Compared to the best previous methods to our knowledge, this approach is more flexible in that less prior knowledge on the data set is required, and it is also much more efficient. Our experiments compare behaviors and performances of this approach and other projective clustering algorithms with different data characteristics. The results show that our technique is scalable to very large databases, and it is able to return accurate clustering results.
  • Keywords
    data mining; pattern clustering; statistical analysis; very large databases; high-dimensional data; histogram construction; projective clustering algorithms; very large databases; Clustering algorithms; Histograms; Image analysis; Image databases; Image segmentation; Partitioning algorithms; Pattern analysis; Pattern recognition; Principal component analysis; Spatial databases;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2005.47
  • Filename
    1388247