• DocumentCode
    3133936
  • Title

    A feature selection based on deviation from feature centroid for text categorization

  • Author

    Yang, Jieming ; Liu, Zhiying

  • Author_Institution
    Coll. of Inf. Eng., Northeast Dianli Univ., Jilin, China
  • Volume
    1
  • fYear
    2011
  • fDate
    25-28 July 2011
  • Firstpage
    180
  • Lastpage
    184
  • Abstract
    Text categorization is very vital in assisting people to process automatically the information which increases exponentially. But the high dimensionality of the vector space is a big hurdle in applying many sophisticated learning algorithms in text categorization. So feature selection has become a research focus in text categorization. In this paper, we proposed a new feature selection, named FCFS, which uses deviation from the feature centroid over all categories as the score of a feature. We compare the proposed method with four well known feature selections using two classification algorithms on three datasets. The experiments show that proposed method is significantly better than information gain, orthogonal centroid feature selection, mutual information and odds rate in terms of accuracy when Naïve Bayes classifier and Support Vector machines are used.
  • Keywords
    Bayes methods; feature extraction; learning (artificial intelligence); pattern classification; support vector machines; text analysis; FCFS; Naive Bayes classifier; feature selection; information gain; learning algorithm; orthogonal centroid feature selection; support vector machine; text categorization; vector space; Accuracy; Machine learning; Mutual information; Support vector machine classification; Text categorization; Training; feature selection; feature vector space; text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Information Processing (ICICIP), 2011 2nd International Conference on
  • Conference_Location
    Harbin
  • Print_ISBN
    978-1-4577-0813-8
  • Type

    conf

  • DOI
    10.1109/ICICIP.2011.6008227
  • Filename
    6008227