• DocumentCode
    1921600
  • Title

    The importance of stop word removal on recall values in text categorization

  • Author

    Silva, Catarina ; Ribeiro, Bemardete

  • Author_Institution
    Dept. de Engenharia Inf., Coimbra Univ., Portugal
  • Volume
    3
  • fYear
    2003
  • fDate
    20-24 July 2003
  • Firstpage
    1661
  • Abstract
    Given a data set and a learning task such as classification, there are two prime motives for executing some kind of data set reduction. On one hand there is the possible algorithm performance improvement. On the other hand the decrease in the overall size of the data set can bring advantages in storage space used and time spent computing. Our purpose is to determine the importance of several basic reduction techniques on Support Vector Machines, by comparing their relative performance improvement when applied on the standard REUTERS-21578 benchmark.
  • Keywords
    classification; data reduction; indexing; information retrieval; support vector machines; text editing; REUTERS-21578 benchmark; algorithm performance improvement; data set reduction; learning task; stop word removal; storage space; support vector machines; text categorization; text classification; Humans; Information retrieval; Internet; Large-scale systems; Support vector machine classification; Support vector machines; Taxonomy; Text categorization; Text mining; Text processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2003. Proceedings of the International Joint Conference on
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-7898-9
  • Type

    conf

  • DOI
    10.1109/IJCNN.2003.1223656
  • Filename
    1223656