• DocumentCode
    3301365
  • Title

    Divergence-based feature selection for naïve Bayes text classification

  • Author

    Wang, Huizhen ; Zhu, Jingbo ; Su, Keh-Yih

  • Author_Institution
    Natural Language Process. Lab., Northeastern Univ., Shenyang
  • fYear
    2008
  • fDate
    19-22 Oct. 2008
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    A new divergence-based approach to feature selection for naive Bayes text classification is proposed in this paper. In this approach, the discrimination power of each feature is directly used for ranking various features through a criterion named overall-divergence, which is based on the divergence measures evaluated between various class density function pairs. Compared with other state-of-the-art algorithms (e.g. IG and CHI), the proposed approach shows more discrimination power for classifying confusing classes, and achieves better or comparable performance on evaluation data sets.
  • Keywords
    Bayes methods; classification; text analysis; divergence measure; divergence-based feature selection; feature ranking; naive Bayes text classification; overall-divergence; Density functional theory; Density measurement; Indexing; Information retrieval; Laboratories; Natural language processing; Power measurement; Testing; Text categorization; Text processing; Divergence-based; feature selection; overall-divergence; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-4515-8
  • Electronic_ISBN
    978-1-4244-2780-2
  • Type

    conf

  • DOI
    10.1109/NLPKE.2008.4906808
  • Filename
    4906808