DocumentCode :
1598439
Title :
A Feature Selection Simultaneously Based on Intra-category and Extra-Category for Text Categorization
Author :
Liu, Zhiying ; Yang, Jieming
Author_Institution :
Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
Volume :
2
fYear :
2011
Firstpage :
178
Lastpage :
181
Abstract :
Text categorization is an important means to process automatically the information which increases exponentially. But due to the high dimensionality of the text corpus, many sophisticated classifiers can not be efficiently and effectively used in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named SIE, which simultaneously considers the number of documents that contain a feature in intra-category and extra-category. We compare the proposed method with four well known feature selections using two classification algorithms on two text corpora. The experiments show that the proposed method performs significantly better than information gain, orthogonal centroid feature selection and Poisson distribution, and produces comparable performance with X2-statistic in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
Keywords :
Bayes methods; support vector machines; text analysis; Naive Bayes classifier; Poisson distribution; SIE; SVM; X2-statistic; classification algorithms; extracategory; information gain; intracategory; orthogonal centroid feature selection; support vector machines; text categorization; text corpora; Accuracy; Classification algorithms; Educational institutions; Machine learning; Support vector machines; Text categorization; Training; dimensionality reduction; feature selection; text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2011 International Conference on
Conference_Location :
Zhejiang
Print_ISBN :
978-1-4577-0676-9
Type :
conf
DOI :
10.1109/IHMSC.2011.114
Filename :
6038244
Link To Document :
بازگشت