• DocumentCode
    604467
  • Title

    Comparative analysis on feature selection based Bayesian text classification

  • Author

    Guang Yang ; Zhong-Yi Lin ; Yu-Xin Chang ; Lei Wang ; Jin-Kun Tian

  • Author_Institution
    Run Technol. Co., Ltd., Beijing, China
  • fYear
    2012
  • fDate
    29-31 Dec. 2012
  • Firstpage
    1190
  • Lastpage
    1194
  • Abstract
    Feature selection is an important preprocessing step for data in the classification and regression learning. Many feature selection algorithms have been proposed using the different information criteria based on mutual information. However, there is no such comparative study conducted to analyse the effectiveness of these methods under a specific application framework. In this paper, we select 6 different feature selection algorithms, i.e, RelFss, MIFS-U, FCBF, CMIM, mRMR, and mMIFS-U, to compare their reduction capabilities and classification performances in the application of naive Bayesian based text classification. We collect a lot of documents belonging to ten different domains from the Chinese News Web site (www.people.com.cn) as the experimental data, where each of documents includes 1,000 Chinese characters at least. From the experimental results, we can conclude that naive Bayesian with the features selected by mRMR can obtain the highest classification accuracy. The summarized conclusions give some guidelines for feature selection in text classification application.
  • Keywords
    belief networks; pattern classification; text analysis; CMIM algorithm; Chinese news Web site; FCBF algorithm; MIFS-U algorithm; RelFss algorithm; classification accuracy; classification performance; feature selection algorithms; information criteria; mMIFS-U algorithm; mRMR algorithm; mutual information; naive Bayesian based text classification; regression learning; Feature selection; mutual information; naive Bayesian classifier; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4673-2963-7
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2012.6526137
  • Filename
    6526137