DocumentCode :
615260
Title :
Improved mutual information method for text feature selection
Author :
Ding Xiaoming ; Tang Yan
Author_Institution :
Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
fYear :
2013
fDate :
26-28 April 2013
Firstpage :
163
Lastpage :
166
Abstract :
Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn´t consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.
Keywords :
computational complexity; feature extraction; natural language processing; pattern classification; text analysis; Chinese text classification system; computing complexity; corpus category; feature frequency; high-dimensional feature set dimension reduction; improved mutual information method; text categorization; text classification; text feature selection approach; Art; Complexity theory; Computers; Text categorization; feature selection; mutual information; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science & Education (ICCSE), 2013 8th International Conference on
Conference_Location :
Colombo
Print_ISBN :
978-1-4673-4464-7
Type :
conf
DOI :
10.1109/ICCSE.2013.6553903
Filename :
6553903
Link To Document :
بازگشت