DocumentCode
388790
Title
Toward semi-automatic construction of training-corpus for text classification
Author
Guan, Jihong ; Zhou, Shuigeng
Author_Institution
Sch. of Comput. Sci., Wuhan Univ., China
Volume
4
fYear
2002
fDate
6-9 Oct. 2002
Abstract
Text classification is becoming more and more important with the rapid growth of on-line information available. It was observed that the quality of the training corpus impacts the performance of the trained classifier. This paper proposes an approach to build high-quality training corpuses for better classification performance by first exploring the properties of training corpuses, and then giving an algorithm for constructing training corpuses semi-automatically. Preliminary experimental results validate our approach: classifiers based on the training corpuses constructed by our approach can achieve good performance while the training corpus´ size is significantly compressed. Our approach can be used for building an efficient and lightweight classification system.
Keywords
classification; information retrieval; natural languages; text analysis; Chinese text; experimental results; natural language; online information; performance; semi-automatic training corpus development; text classification; Algorithm design and analysis; Buildings; Computer science; Information retrieval; Machine learning; Organizing; Pattern recognition; Software engineering; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Systems, Man and Cybernetics, 2002 IEEE International Conference on
ISSN
1062-922X
Print_ISBN
0-7803-7437-1
Type
conf
DOI
10.1109/ICSMC.2002.1173245
Filename
1173245
Link To Document