DocumentCode
1974336
Title
Improvement and Application of TF•IDF Method Based on Text Classification
Author
Kuang, Qiaoyan ; Xu, Xiaoming
Author_Institution
Comput. Dept., Hunan Int. Econ. Univ., Changsha, China
fYear
2010
fDate
20-22 Aug. 2010
Firstpage
1
Lastpage
4
Abstract
Feature extraction is the important prerequisite of classifying text effectively and automatically. TF·IDF is widely used to express the text feature weight. But it has some problems. TF·IDF can´t reflect the distribution of terms in the text, and then can´t reflect the importance degree and the difference between categories. This paper proposes a new feature weighting method-TF·IDF·Ci to which a new weight Ci is added to express the differences between classes on the base of original TF·IDF. After combining TF·IDF·Ci and specific classification algorithm, it always get a larger macro F1 value than of TF·IDF. Meanwhile, the standard deviation of the classification index of the TF·IDF·Ci is much smaller than that of TF·IDF. That shows TF·IDF·Ci not only improve the classification precision but also decreases the sensitivity towards feature dimensions to some extent.
Keywords
feature extraction; text analysis; TF·IDF method; TF·IDF·Ci method; feature extraction; feature weighting method; text classification; Classification algorithms; Computers; Economics; Feature extraction; Sensitivity; Support vector machine classification; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Internet Technology and Applications, 2010 International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5142-5
Electronic_ISBN
978-1-4244-5143-2
Type
conf
DOI
10.1109/ITAPP.2010.5566113
Filename
5566113
Link To Document