• DocumentCode
    420953
  • Title

    Design and implementation of a multi-label Chinese text categorization system

  • Author

    Chen, Junli ; Zhou, Xuezhong ; Wu, Zhaohui

  • Author_Institution
    Coll. of Comput. Sci. & Technol., Zhejiang Univ., Hangzhou, China
  • Volume
    3
  • fYear
    2004
  • fDate
    15-19 June 2004
  • Firstpage
    1885
  • Abstract
    Based on the Chinese character representation and the boosting algorithm, a multi-label Chinese text categorization system is demonstrated. This system has been successfully tested on two multi-labeled datasets, namely traditional Chinese medicine (TCM) dataset- TCM-MED and Reuters21578. Experiments have also been carried out to compare the performance of the boosting algorithm with two other traditional algorithms on the two datasets mentioned above. The results indicate that the boosting algorithm outperforms the other two algorithms in Chinese text categorization.
  • Keywords
    character recognition; feature extraction; learning (artificial intelligence); text analysis; Chinese character representation; boosting algorithm; multilabel Chinese text categorization system; multilabeled datasets; traditional Chinese medicine dataset; Boosting; Computer science; Educational institutions; Medical tests; Postal services; System testing; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Automation, 2004. WCICA 2004. Fifth World Congress on
  • Print_ISBN
    0-7803-8273-0
  • Type

    conf

  • DOI
    10.1109/WCICA.2004.1341906
  • Filename
    1341906