• DocumentCode
    3318213
  • Title

    Learning effective features for Chinese text categorization

  • Author

    Luo, Dingsheng ; Wang, Xinhao ; Wu, Xihong ; Chi, Huisheng

  • Author_Institution
    Nat. Lab. on Machine Perception, Peking Univ., Beijing, China
  • fYear
    2005
  • fDate
    30 Oct.-1 Nov. 2005
  • Firstpage
    608
  • Lastpage
    613
  • Abstract
    Text categorization task always suffers from a high dimension problem, which leads the learning system to be in a status of either lower efficiency or lower performance. A number of feature selection methods have therefore been adopted or proposed for its dimensional reduction, such as DF, IG, Chi Square and so on. Unlike those traditional feature selection methods, in this paper, a feature selection method based on the idea of "discriminative learning" is presented, where those learned "effective" features rather than traditional "important" features are used to construct feature space. During learning effective features, a variant AdaBoost algorithm as well as a pairwise multiclass learning scheme are adopted. Simulation results show the presented method works well.
  • Keywords
    classification; feature extraction; learning (artificial intelligence); text analysis; Chinese text categorization; dimensional reduction; discriminative learning; feature selection methods; pairwise multiclass learning scheme; variant AdaBoost algorithm; Bayesian methods; Classification tree analysis; Dictionaries; Feature extraction; Frequency; Information management; Learning systems; Machine learning; Nearest neighbor searches; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
  • Print_ISBN
    0-7803-9361-9
  • Type

    conf

  • DOI
    10.1109/NLPKE.2005.1598809
  • Filename
    1598809