DocumentCode
3301365
Title
Divergence-based feature selection for naïve Bayes text classification
Author
Wang, Huizhen ; Zhu, Jingbo ; Su, Keh-Yih
Author_Institution
Natural Language Process. Lab., Northeastern Univ., Shenyang
fYear
2008
fDate
19-22 Oct. 2008
Firstpage
1
Lastpage
7
Abstract
A new divergence-based approach to feature selection for naive Bayes text classification is proposed in this paper. In this approach, the discrimination power of each feature is directly used for ranking various features through a criterion named overall-divergence, which is based on the divergence measures evaluated between various class density function pairs. Compared with other state-of-the-art algorithms (e.g. IG and CHI), the proposed approach shows more discrimination power for classifying confusing classes, and achieves better or comparable performance on evaluation data sets.
Keywords
Bayes methods; classification; text analysis; divergence measure; divergence-based feature selection; feature ranking; naive Bayes text classification; overall-divergence; Density functional theory; Density measurement; Indexing; Information retrieval; Laboratories; Natural language processing; Power measurement; Testing; Text categorization; Text processing; Divergence-based; feature selection; overall-divergence; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Language Processing and Knowledge Engineering, 2008. NLP-KE '08. International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-4515-8
Electronic_ISBN
978-1-4244-2780-2
Type
conf
DOI
10.1109/NLPKE.2008.4906808
Filename
4906808
Link To Document