• DocumentCode
    1921577
  • Title

    A multi-label Chinese text categorization system based on boosting algorithm

  • Author

    Chen, Junli ; Zhou, Xuezhong ; Wu, Zhaohui

  • Author_Institution
    Coll. of Compute Sci., Zhejiang Univ., China
  • fYear
    2004
  • fDate
    14-16 Sept. 2004
  • Firstpage
    1153
  • Lastpage
    1158
  • Abstract
    This paper presents a multi-label Chinese text categorization system based on Chinese character features and boosting algorithm. This system has been successfully evaluated on the TCM-MED dataset provided by China Academy of traditional Chinese medicine (TCM) and the Reuters-21578 benchmark. We suggest that the TCM-MED dataset can be used as a standard corpus for the Chinese text categorization tasks. We have also carried out experiments to compare the performance of the boosting algorithm with two other traditional algorithms on the same datasets. The results indicate that for the design of a multi-label Chinese text categorization system, the boosting algorithm has a high performance and outperforms the other two algorithms.
  • Keywords
    classification; natural languages; text analysis; China Academy; Chinese character features; Reuters-21578 benchmark; TCM-MED dataset; boosting algorithm; multilabel Chinese text categorization system; traditional Chinese medicine; Algorithm design and analysis; Boosting; Dispatching; Document handling; Educational institutions; Information processing; Machine learning; Machine learning algorithms; Nearest neighbor searches; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2004. CIT '04. The Fourth International Conference on
  • Print_ISBN
    0-7695-2216-5
  • Type

    conf

  • DOI
    10.1109/CIT.2004.1357350
  • Filename
    1357350