• DocumentCode
    3318187
  • Title

    A comparative study on unsupervised feature selection methods for text clustering

  • Author

    Liu, Luying ; Kang, Jianchu ; Yu, Jing ; Wang, Zhongliang

  • Author_Institution
    Sch. of Comput. Sci., Beijing Univ., China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    597
  • Lastpage
    601
  • Abstract
    Text clustering is one of the central problems in text mining and information retrieval area. For the high dimensionality of feature space and the inherent data sparsity, performance of clustering algorithms will dramatically decline. Two techniques are used to deal with this problem: feature extraction and feature selection. Feature selection methods have been successfully applied to text categorization but seldom applied to text clustering due to the unavailability of class label information. In this paper, four unsupervised feature selection methods, DF, TC, TVQ, and a new proposed method TV are introduced. Experiments are taken to show that feature selection methods can improves efficiency as well as accuracy of text clustering. Three clustering validity criterions are studied and used to evaluate clustering results.
  • Keywords
    data mining; feature extraction; pattern clustering; text analysis; unsupervised learning; data sparsity; feature extraction; feature space dimensionality; information retrieval; text clustering; text mining; unsupervised feature selection; Clustering algorithms; Computer science; Feature extraction; Frequency; Information retrieval; Navigation; Principal component analysis; TV; Text categorization; Text mining;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598807
  • Filename
    1598807