DocumentCode :
832811
Title :
Scoring and selecting terms for text categorization
Author :
Montanés, Elena ; Díaz, Irene ; Ranilla, José ; Combarro, Elías F. ; Fernández, Javier
Author_Institution :
Oviedo Univ., Spain
Volume :
20
Issue :
3
fYear :
2005
Firstpage :
40
Lastpage :
47
Abstract :
We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We´ve tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we´ve analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.
Keywords :
classification; feature extraction; information retrieval; learning (artificial intelligence); text analysis; word processing; ML-based scoring measures; feature selection; information retrieval; machine learning; text categorization; Entropy; Frequency estimation; Frequency measurement; Gain measurement; Information retrieval; Information theory; Learning systems; Probability; Statistical distributions; Text categorization; feature selection; information retrieval; machine learning; support vector machines; text categorization;
fLanguage :
English
Journal_Title :
Intelligent Systems, IEEE
Publisher :
ieee
ISSN :
1541-1672
Type :
jour
DOI :
10.1109/MIS.2005.49
Filename :
1439478
Link To Document :
بازگشت