Title :
Term Weighting Approaches for Text Categorization Improving
Author :
Matsunaga, L.A. ; Ebecken, N.F.F.
Author_Institution :
Fed. District Legislative Assembly
Abstract :
The objective of the text categorization problem examined in this paper corresponds to automatically distribute the legislative bills to the committees at the Federal District Legislative Assembly in Brasilia, Brazil. For this study the replacement of the idf part in TFIDF by a new term selection measure - absl logit- and by bi-normal separation produced the best general classification results, using support vector machines models (SVM), when compared with TFIDF and with the use of common term selection measures - chi-square, information gain, gain ratio and odds ratio - to replace the idf part in TFIDF.
Keywords :
category theory; support vector machines; text analysis; support vector machines models; term selection measures; term weighting; text categorization; Assembly systems; Dictionaries; Frequency; Gain measurement; Intelligent systems; Support vector machine classification; Support vector machines; Text categorization; Text mining; Vocabulary; term weighting; text; text categorization;
Conference_Titel :
Intelligent Systems Design and Applications, 2008. ISDA '08. Eighth International Conference on
Conference_Location :
Kaohsiung
Print_ISBN :
978-0-7695-3382-7
DOI :
10.1109/ISDA.2008.21