DocumentCode
3133936
Title
A feature selection based on deviation from feature centroid for text categorization
Author
Yang, Jieming ; Liu, Zhiying
Author_Institution
Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
Volume
1
fYear
2011
fDate
25-28 July 2011
Firstpage
180
Lastpage
184
Abstract
Text categorization is very vital in assisting people to process automatically the information which increases exponentially. But the high dimensionality of the vector space is a big hurdle in applying many sophisticated learning algorithms in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named FCFS, which uses deviation from the feature centroid over all categories as the score of a feature. We compare the proposed method with four well known feature selections using two classification algorithms on three datasets. The experiments show that proposed method is significantly better than information gain, orthogonal centroid feature selection, mutual information and odds rate in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
Keywords
Bayes methods; feature extraction; learning (artificial intelligence); pattern classification; support vector machines; text analysis; FCFS; Naive Bayes classifier; feature selection; information gain; learning algorithm; orthogonal centroid feature selection; support vector machine; text categorization; vector space; Accuracy; Machine learning; Mutual information; Support vector machine classification; Text categorization; Training; feature selection; feature vector space; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control and Information Processing (ICICIP), 2011 2nd International Conference on
Conference_Location
Harbin
Print_ISBN
978-1-4577-0813-8
Type
conf
DOI
10.1109/ICICIP.2011.6008227
Filename
6008227
Link To Document