DocumentCode
604467
Title
Comparative analysis on feature selection based Bayesian text classification
Author
Guang Yang ; Zhong-Yi Lin ; Yu-Xin Chang ; Lei Wang ; Jin-Kun Tian
Author_Institution
Run Technol. Co., Ltd., Beijing, China
fYear
2012
fDate
29-31 Dec. 2012
Firstpage
1190
Lastpage
1194
Abstract
Feature selection is an important preprocessing step for data in the classification and regression learning. Many feature selection algorithms have been proposed using the different information criteria based on mutual information. However, there is no such comparative study conducted to analyse the effectiveness of these methods under a specific application framework. In this paper, we select 6 different feature selection algorithms, i.e, RelFss, MIFS-U, FCBF, CMIM, mRMR, and mMIFS-U, to compare their reduction capabilities and classification performances in the application of naive Bayesian based text classification. We collect a lot of documents belonging to ten different domains from the Chinese News Web site (www.people.com.cn) as the experimental data, where each of documents includes 1,000 Chinese characters at least. From the experimental results, we can conclude that naive Bayesian with the features selected by mRMR can obtain the highest classification accuracy. The summarized conclusions give some guidelines for feature selection in text classification application.
Keywords
belief networks; pattern classification; text analysis; CMIM algorithm; Chinese news Web site; FCBF algorithm; MIFS-U algorithm; RelFss algorithm; classification accuracy; classification performance; feature selection algorithms; information criteria; mMIFS-U algorithm; mRMR algorithm; mutual information; naive Bayesian based text classification; regression learning; Feature selection; mutual information; naive Bayesian classifier; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
Conference_Location
Changchun
Print_ISBN
978-1-4673-2963-7
Type
conf
DOI
10.1109/ICCSNT.2012.6526137
Filename
6526137
Link To Document