Title :
Scoring and selecting terms for text categorization
Author :
Montanés, Elena ; Díaz, Irene ; Ranilla, José ; Combarro, Elías F. ; Fernández, Javier
Author_Institution :
Oviedo Univ., Spain
Abstract :
We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We´ve tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we´ve analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.
Keywords :
classification; feature extraction; information retrieval; learning (artificial intelligence); text analysis; word processing; ML-based scoring measures; feature selection; information retrieval; machine learning; text categorization; Entropy; Frequency estimation; Frequency measurement; Gain measurement; Information retrieval; Information theory; Learning systems; Probability; Statistical distributions; Text categorization; feature selection; information retrieval; machine learning; support vector machines; text categorization;
Journal_Title :
Intelligent Systems, IEEE
DOI :
10.1109/MIS.2005.49