• DocumentCode
    1694185
  • Title

    A Model for Term Selection in Text Categorization Problems

  • Author

    Cannas, Laura Maria ; Dessì, Nicoletta ; Dessì, Stefania

  • Author_Institution
    Dipt. di Mat. e Inf., Univ. degli Studi di Cagliari, Cagliari, Italy
  • fYear
    2012
  • Firstpage
    169
  • Lastpage
    173
  • Abstract
    In the last ten years, automatic Text Categorization (TC) has been gaining an increasing interest from the research community, due to the need to organize a massive number of digital documents. Following a machine learning paradigm, this paper presents a model which regards TC as a classification task supported by a wrapper approach and combines the utilization of a Genetic Algorithm (GA) with a filter. First, a filter is used to weigh the relevance of terms in documents. Then, the top-ranked terms are grouped in several nested sets of relatively small size. These sets are explored by a GA which extracts the subset of terms that best categorize documents. Experimental results on the Reuters-21578 dataset state the effectiveness of the proposed model and its competitiveness with the learning approaches proposed in the TC literature.
  • Keywords
    genetic algorithms; information filtering; learning (artificial intelligence); natural language processing; pattern classification; text analysis; GA; TC; automatic text categorization problem; best categorize documents; classification task; digital documents; genetic algorithm; machine learning paradigm; natural language documents; term selection; text filter; top-ranked terms; Classification algorithms; Filtering algorithms; Genetic algorithms; Machine learning; Measurement; Support vector machines; Text categorization; genetic algorithm; hybrid model; term selection; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database and Expert Systems Applications (DEXA), 2012 23rd International Workshop on
  • Conference_Location
    Vienna
  • ISSN
    1529-4188
  • Print_ISBN
    978-1-4673-2621-6
  • Type

    conf

  • DOI
    10.1109/DEXA.2012.41
  • Filename
    6327421