DocumentCode
420953
Title
Design and implementation of a multi-label Chinese text categorization system
Author
Chen, Junli ; Zhou, Xuezhong ; Wu, Zhaohui
Author_Institution
Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
Volume
3
fYear
2004
fDate
15-19 June 2004
Firstpage
1885
Abstract
Based on the Chinese character representation and the boosting algorithm, a multi-label Chinese text categorization system is demonstrated. This system has been successfully tested on two multi-labeled datasets, namely traditional Chinese medicine (TCM) dataset- TCM-MED and Reuters21578. Experiments have also been carried out to compare the performance of the boosting algorithm with two other traditional algorithms on the two datasets mentioned above. The results indicate that the boosting algorithm outperforms the other two algorithms in Chinese text categorization.
Keywords
character recognition; feature extraction; learning (artificial intelligence); text analysis; Chinese character representation; boosting algorithm; multilabel Chinese text categorization system; multilabeled datasets; traditional Chinese medicine dataset; Boosting; Computer science; Educational institutions; Medical tests; Postal services; System testing; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on
Print_ISBN
0-7803-8273-0
Type
conf
DOI
10.1109/WCICA.2004.1341906
Filename
1341906
Link To Document