• DocumentCode
    3269475
  • Title

    Spectral analysis of text collection for similarity-based clustering

  • Author

    Li, Wenyuan ; Ng, Wee-Keong ; Lim, Ee-Peng

  • Author_Institution
    Center for Adv. Inf. Syst., Nanyang Technol. Univ., Singapore
  • fYear
    2004
  • fDate
    30 March-2 April 2004
  • Firstpage
    833
  • Abstract
    Clustering of text collections is generally difficult due to its high dimensionality, heterogeneity, and large size. These characteristics compound the problem of determining the appropriate similarity space for clustering algorithms. Here, we propose to use the spectral analysis of the similarity space of a text collection to predict clustering behavior before actual clustering is performed. Spectral analysis is a technique that has been adopted across different domains to analyze the key encoding information of a system. Using spectral analysis for prediction is useful in first determining the quality of the similarity space and discovering any possible problems the selected feature set may present.
  • Keywords
    graph theory; statistical analysis; text analysis; key encoding information; similarity-based clustering; spectral analysis; text collection clustering; Clustering algorithms; Eigenvalues and eigenfunctions; Encoding; Graph theory; Information analysis; Information systems; Laplace equations; Space technology; Spectral analysis; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2004. Proceedings. 20th International Conference on
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-2065-0
  • Type

    conf

  • DOI
    10.1109/ICDE.2004.1320064
  • Filename
    1320064