Title :
An Improved TFIDF Feature Selection Algorithm Based On Information Entropy
Author :
Yantao, Zhou ; Jianbo, Tang ; Jiaqin, Wang
Author_Institution :
Hunan Univ., Changsha
Abstract :
The quality of text feature selection affects the accuracy of text categorization greatly. Due to the deficiency of traditional TFIDF without considering the distribution of feature words among classes, the paper analyzed the TFIDF feature selection algorithm, and proposed a new TFIDF feature selection method with concept of information entropy. Experimental results show the method is valid in improving the accuracy of text categorization.
Keywords :
data mining; text analysis; data mining; feature selection algorithm; information entropy; text categorization; text feature selection; Algorithm design and analysis; Data mining; Educational institutions; Frequency; Information analysis; Information entropy; Mutual information; Text categorization; TFIDF; data mining; feature selection; words information entropy;
Conference_Titel :
Control Conference, 2007. CCC 2007. Chinese
Conference_Location :
Hunan
Print_ISBN :
978-7-81124-055-9
Electronic_ISBN :
978-7-900719-22-5
DOI :
10.1109/CHICC.2006.4346845