DocumentCode :
3723122
Title :
Entropy-Based Term Weighting Schemes for Text Categorization in VSM
Author :
Tao Wang; Yi Cai; Ho-fung Leung; Zhiwei Cai; Huaqing Min
Author_Institution :
Sch. of Software Eng., South China Univ. of Technol., Guangzhou, China
fYear :
2015
Firstpage :
325
Lastpage :
332
Abstract :
Term weighting schemes have been widely used in information retrieval and text categorization models. In this paper, we first investigate into the limitations of several state-of-the-art term weighting schemes in the context of text categorization tasks. Considering that category-specific terms are more useful to discriminate different categories, and these terms tend to have smaller entropy with respect to these categories, we then explore the relationship between a term´s discriminating power and its entropy with respect to a set of categories. To this end, we propose two entropy-based term weighting schemes (i.e., tf.dc and tf.bdc) which measure the discriminating power of a term based on its global distributional concentration in the categories of a corpus. To demonstrate the effectiveness of the proposed term weighting schemes, we compare them with seven state-of-the-art schemes on a long-text corpus and a short-text corpus respectively. Our experimental results show that the proposed schemes outperform the state-of-the-art schemes in text categorization tasks with KNN and SVM.
Keywords :
"Radio frequency","Entropy","Training","Text categorization","Power measurement","Weight measurement","Information retrieval"
Publisher :
ieee
Conference_Titel :
Tools with Artificial Intelligence (ICTAI), 2015 IEEE 27th International Conference on
ISSN :
1082-3409
Type :
conf
DOI :
10.1109/ICTAI.2015.57
Filename :
7372153
Link To Document :
بازگشت