DocumentCode :
2539546
Title :
Term-frequency Based Feature Selection Methods for Text Categorization
Author :
Xu, Yan ; Chen, Lin
fYear :
2010
fDate :
13-15 Dec. 2010
Firstpage :
280
Lastpage :
283
Abstract :
A major difficulty of text categorization is the high dimensionality of the feature space. Feature selection is an important step in text categorization to reduce the feature space. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization, but they do not use term frequency information. In this paper, we put forward improved DF, improved IG and improved MI methods which use term frequency information. Experiments show that our improved methods are seen notable improvements in the performance than the original DF, IG and MI methods.
Keywords :
statistical analysis; text analysis; feature selection; improved document frequency thresholding; improved information gain; improved mutual information; term frequency information; text categorization; Classification algorithms; Frequency conversion; Machine learning; Mutual information; Text categorization; Time frequency analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Genetic and Evolutionary Computing (ICGEC), 2010 Fourth International Conference on
Conference_Location :
Shenzhen
Print_ISBN :
978-1-4244-8891-9
Electronic_ISBN :
978-0-7695-4281-2
Type :
conf
DOI :
10.1109/ICGEC.2010.76
Filename :
5715424
Link To Document :
بازگشت