• DocumentCode
    3318194
  • Title

    Improving Chinese text categorization by outlier learning

  • Author

    Wang, Xinhao ; Luo, Dingsheng ; Wu, Xihong ; Chi, Huisheng

  • Author_Institution
    Nat. Lab. on Machine Perception, Peking Univ., Beijing, China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    602
  • Lastpage
    607
  • Abstract
    Text categorization is one of the typical machine learning tasks that suffer from an incomplete training data problem. A main reason is the existence of outliers in training data, such as non-sense documents, documents mislabeled or lying on the border between different categories, and documents that are out of the defined categories, etc. Therefore, in a text categorization task, outlier learning technique could be adopted to improve text categorization. In this paper, an outlier learning based text categorization system is proposed, where AdaBoost algorithm is adopted for outlier identifying. Simulation results reveal that the new system is successful in improving learning performance for text categorization.
  • Keywords
    classification; learning (artificial intelligence); text analysis; AdaBoost algorithm; Chinese text categorization; incomplete training data; machine learning; outlier learning; Boosting; Classification tree analysis; Feature extraction; Laboratories; Learning systems; Machine learning; Nearest neighbor searches; Pattern recognition; Text categorization; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598808
  • Filename
    1598808