مرکز منطقه ای اطلاع رساني علوم و فناوري - Scoring and selecting terms for text categorization

DocumentCode :

832811

Title :

Scoring and selecting terms for text categorization

Author :

Montanés, Elena ; Díaz, Irene ; Ranilla, José ; Combarro, Elías F. ; Fernández, Javier

Author_Institution :

Oviedo Univ., Spain

Volume :

Issue :

fYear :

2005

Firstpage :

Lastpage :

Abstract :

We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We´ve tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we´ve analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.

Keywords :

classification; feature extraction; information retrieval; learning (artificial intelligence); text analysis; word processing; ML-based scoring measures; feature selection; information retrieval; machine learning; text categorization; Entropy; Frequency estimation; Frequency measurement; Gain measurement; Information retrieval; Information theory; Learning systems; Probability; Statistical distributions; Text categorization; feature selection; information retrieval; machine learning; support vector machines; text categorization;

fLanguage :

English

Journal_Title :

Intelligent Systems, IEEE

Publisher :

ieee

ISSN :

1541-1672

Type :

jour

DOI :

10.1109/MIS.2005.49

Filename :

1439478

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=832811