DocumentCode
1694185
Title
A Model for Term Selection in Text Categorization Problems
Author
Cannas, Laura Maria ; Dessì, Nicoletta ; Dessì, Stefania
Author_Institution
Dipt. di Mat. e Inf., Univ. degli Studi di Cagliari, Cagliari, Italy
fYear
2012
Firstpage
169
Lastpage
173
Abstract
In the last ten years, automatic Text Categorization (TC) has been gaining an increasing interest from the research community, due to the need to organize a massive number of digital documents. Following a machine learning paradigm, this paper presents a model which regards TC as a classification task supported by a wrapper approach and combines the utilization of a Genetic Algorithm (GA) with a filter. First, a filter is used to weigh the relevance of terms in documents. Then, the top-ranked terms are grouped in several nested sets of relatively small size. These sets are explored by a GA which extracts the subset of terms that best categorize documents. Experimental results on the Reuters-21578 dataset state the effectiveness of the proposed model and its competitiveness with the learning approaches proposed in the TC literature.
Keywords
genetic algorithms; information filtering; learning (artificial intelligence); natural language processing; pattern classification; text analysis; GA; TC; automatic text categorization problem; best categorize documents; classification task; digital documents; genetic algorithm; machine learning paradigm; natural language documents; term selection; text filter; top-ranked terms; Classification algorithms; Filtering algorithms; Genetic algorithms; Machine learning; Measurement; Support vector machines; Text categorization; genetic algorithm; hybrid model; term selection; text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
Conference_Location
Vienna
ISSN
1529-4188
Print_ISBN
978-1-4673-2621-6
Type
conf
DOI
10.1109/DEXA.2012.41
Filename
6327421
Link To Document