• DocumentCode
    832811
  • Title

    Scoring and selecting terms for text categorization

  • Author

    Montanés, Elena ; Díaz, Irene ; Ranilla, José ; Combarro, Elías F. ; Fernández, Javier

  • Author_Institution
    Oviedo Univ., Spain
  • Volume
    20
  • Issue
    3
  • fYear
    2005
  • Firstpage
    40
  • Lastpage
    47
  • Abstract
    We propose a set of (machine learning) ML-based scoring measures for conducting feature selection. We´ve tested these measures on documents from two well-known corpora, comparing them with other measures previously applied for this purpose. In particular, we´ve analyzed which measure obtains the best overall classification performance in terms of properties such as precision and recall, emphasizing to what extent some statistical properties of the corpus affects performance. The results show that some of our measures outperform the traditional measures in certain situations.
  • Keywords
    classification; feature extraction; information retrieval; learning (artificial intelligence); text analysis; word processing; ML-based scoring measures; feature selection; information retrieval; machine learning; text categorization; Entropy; Frequency estimation; Frequency measurement; Gain measurement; Information retrieval; Information theory; Learning systems; Probability; Statistical distributions; Text categorization; feature selection; information retrieval; machine learning; support vector machines; text categorization;
  • fLanguage
    English
  • Journal_Title
    Intelligent Systems, IEEE
  • Publisher
    ieee
  • ISSN
    1541-1672
  • Type

    jour

  • DOI
    10.1109/MIS.2005.49
  • Filename
    1439478