DocumentCode
832811
Title
Scoring and selecting terms for text categorization
Author
Montanés, Elena ; Díaz, Irene ; Ranilla, José ; Combarro, Elías F. ; Fernández, Javier
Author_Institution
Oviedo Univ., Spain
Volume
20
Issue
3
fYear
2005
Firstpage
40
Lastpage
47
Abstract
We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We´ve tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we´ve analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.
Keywords
classification; feature extraction; information retrieval; learning (artificial intelligence); text analysis; word processing; ML-based scoring measures; feature selection; information retrieval; machine learning; text categorization; Entropy; Frequency estimation; Frequency measurement; Gain measurement; Information retrieval; Information theory; Learning systems; Probability; Statistical distributions; Text categorization; feature selection; information retrieval; machine learning; support vector machines; text categorization;
fLanguage
English
Journal_Title
Intelligent Systems, IEEE
Publisher
ieee
ISSN
1541-1672
Type
jour
DOI
10.1109/MIS.2005.49
Filename
1439478
Link To Document