• DocumentCode
    2849742
  • Title

    Subspace selection for clustering high-dimensional data

  • Author

    Baumgartner, Christian ; Plant, Claudia ; Railing, K. ; Kriegel, Hans-Peter ; Kröger, Peer

  • Author_Institution
    Univ. for Health Sci., Med. Informatics & Technol., Innsbruck, Austria
  • fYear
    2004
  • fDate
    1-4 Nov. 2004
  • Firstpage
    11
  • Lastpage
    18
  • Abstract
    In high-dimensional feature spaces traditional clustering algorithms tend to break down in terms of efficiency and quality. Nevertheless, the data sets often contain clusters which are hidden in various subspaces of the original feature space. In this paper, we present a feature selection technique called SURFING (subspaces relevant for clustering) that finds all subspaces interesting for clustering and sorts them by relevance. The sorting is based on a quality criterion for the interestingness of a subspace using the k-nearest neighbor distances of the objects. As our method is more or less parameterless, it addresses the unsupervised notion of the data mining task "clustering" in a best possible way. A broad evaluation based on synthetic and real-world data sets demonstrates that SURFING is suitable to find all relevant sub-spaces in high dimensional, sparse data sets and produces better results than comparative methods.
  • Keywords
    data mining; pattern clustering; sorting; SURFING; clustering algorithm; data mining; feature selection; feature spaces; high-dimensional data clustering; k-nearest neighbor distances; sorting; sparse data sets; subspace selection; subspaces relevant for clustering; Biomedical informatics; Clustering algorithms; Clustering methods; Computer science; Data mining; Density measurement; Navigation; Principal component analysis; Sorting; Space technology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining, 2004. ICDM '04. Fourth IEEE International Conference on
  • Print_ISBN
    0-7695-2142-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2004.10112
  • Filename
    1410261