• DocumentCode
    2550601
  • Title

    Research on the feature selection techniques used in text classification

  • Author

    Li, Yan ; Chen, Chungang

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Xi´´an Univ. of Technol., Xi´´an, China
  • fYear
    2012
  • fDate
    29-31 May 2012
  • Firstpage
    725
  • Lastpage
    729
  • Abstract
    With the ever-increasing number of digital documents, the ability to automatically classify those documents both quickly and accurately is becoming more critical and difficult. A text classification system for Chinese documents is developed in this paper. A HTF-WDF algorithm is proposed for feature selection. Different from other feature selection algorithms, this method considers the effect of term frequency. Using the idea of fuzzy feature, the terms with high term frequency (HTF) are distinguished and appended to the feature list. The features which can represent the topic of the documents are picked out according to the weighted document frequencies (WDF), which can avoid the problems of the traditional document frequency (DF) method. Then the Support Vector Machine (SVM) is used to training the classifier. The proposed algorithm is verified by representative Chinese documents. The experiment results manifest the superiority of the proposed algorithm to the traditional DF algorithm.
  • Keywords
    fuzzy set theory; natural language processing; pattern classification; support vector machines; text analysis; digital document; document classification; feature selection technique; fuzzy feature; high term frequency; representative Chinese document verification; support vector machine; text classification system; weighted document frequency; Accuracy; Algorithm design and analysis; Classification algorithms; Support vector machine classification; Testing; Text categorization; Training; feature selection; machine learning; support vector machine; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
  • Conference_Location
    Sichuan
  • Print_ISBN
    978-1-4673-0025-4
  • Type

    conf

  • DOI
    10.1109/FSKD.2012.6234223
  • Filename
    6234223