Title :
A symmetric term weighting scheme for text categorization based on term occurrence probabilities
Author :
Zafer Erenel;Hakan Altinçay;Ekrem Varoğlu
Author_Institution :
Department of Computer Engineering, Eastern Mediterranean University, Famagusta, North Cyprus
Abstract :
Term weighting schemes used in text categorization can be considered as functions of term occurence probabilities in positive and negative classes. In this paper, widely used weighting schemes are firstly evaluated from this perspective. Then, a novel feature weighting scheme based on term occurrence probabilities is proposed. Experiments conducted using SVM classifier on the Reuters-21578 ModApte Top10 dataset shows that the proposed method outperforms other well known measures such as CHI, IG, OR and RF in terms of macro-F1 and micro-F1 scores.
Keywords :
"Text categorization","Weight measurement","Frequency measurement","Support vector machines","Radio frequency","Support vector machine classification","Gain measurement","Extraterrestrial measurements","Logic","Robustness"
Conference_Titel :
Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on
Print_ISBN :
978-1-4244-3429-9
DOI :
10.1109/ICSCCW.2009.5379438