DocumentCode :
1639442
Title :
An Improved TFIDF Feature Selection Algorithm Based On Information Entropy
Author :
Yantao, Zhou ; Jianbo, Tang ; Jiaqin, Wang
Author_Institution :
Hunan Univ., Changsha
fYear :
2007
Firstpage :
312
Lastpage :
315
Abstract :
The quality of text feature selection affects the accuracy of text categorization greatly. Due to the deficiency of traditional TFIDF without considering the distribution of feature words among classes, the paper analyzed the TFIDF feature selection algorithm, and proposed a new TFIDF feature selection method with concept of information entropy. Experimental results show the method is valid in improving the accuracy of text categorization.
Keywords :
data mining; text analysis; data mining; feature selection algorithm; information entropy; text categorization; text feature selection; Algorithm design and analysis; Data mining; Educational institutions; Frequency; Information analysis; Information entropy; Mutual information; Text categorization; TFIDF; data mining; feature selection; words information entropy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Control Conference, 2007. CCC 2007. Chinese
Conference_Location :
Hunan
Print_ISBN :
978-7-81124-055-9
Electronic_ISBN :
978-7-900719-22-5
Type :
conf
DOI :
10.1109/CHICC.2006.4346845
Filename :
4346845
Link To Document :
بازگشت