Title :
A Novel Term Weighting Scheme for Automated Text Categorization
Author :
Xu, Hongzhi ; Li, Chunping
Author_Institution :
Tsinghua Univ., Beijing
Abstract :
Term weighting is an important task for text classification. Inverse document frequency (IDF) is one of the most popular methods for this task; however, in some situations, such as supervised learning for text categorization, it doesn ´t weight terms properly, because it neglects the category information and assumes that a term that occurs in smaller set of documents should get a higher weight. There have been several term weighting schemes that consider the category information. In this paper, we present a new term weighting scheme that considers more information provided by the term distribution among different categories. The experiments show that our method is more effective than three other popular schemes.
Keywords :
text analysis; automated text categorization; inverse document frequency; supervised learning; term weighting scheme; text classification; Application software; Design engineering; Filtering; Frequency; Information systems; Intelligent systems; Supervised learning; Systems engineering and theory; Text categorization;
Conference_Titel :
Intelligent Systems Design and Applications, 2007. ISDA 2007. Seventh International Conference on
Conference_Location :
Rio de Janeiro
Print_ISBN :
978-0-7695-2976-9
DOI :
10.1109/ISDA.2007.26