• DocumentCode
    388790
  • Title

    Toward semi-automatic construction of training-corpus for text classification

  • Author

    Guan, Jihong ; Zhou, Shuigeng

  • Author_Institution
    Sch. of Comput. Sci., Wuhan Univ., China
  • Volume
    4
  • fYear
    2002
  • fDate
    6-9 Oct. 2002
  • Abstract
    Text classification is becoming more and more important with the rapid growth of on-line information available. It was observed that the quality of the training corpus impacts the performance of the trained classifier. This paper proposes an approach to build high-quality training corpuses for better classification performance by first exploring the properties of training corpuses, and then giving an algorithm for constructing training corpuses semi-automatically. Preliminary experimental results validate our approach: classifiers based on the training corpuses constructed by our approach can achieve good performance while the training corpus´ size is significantly compressed. Our approach can be used for building an efficient and lightweight classification system.
  • Keywords
    classification; information retrieval; natural languages; text analysis; Chinese text; experimental results; natural language; online information; performance; semi-automatic training corpus development; text classification; Algorithm design and analysis; Buildings; Computer science; Information retrieval; Machine learning; Organizing; Pattern recognition; Software engineering; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Systems, Man and Cybernetics, 2002 IEEE International Conference on
  • ISSN
    1062-922X
  • Print_ISBN
    0-7803-7437-1
  • Type

    conf

  • DOI
    10.1109/ICSMC.2002.1173245
  • Filename
    1173245