DocumentCode :
1927006
Title :
Integrating Incremental Feature Weighting into NaÃ\x8fve Bayes Text Classifier
Author :
Kim, Han Joon ; Chang, Jaeyoung
Author_Institution :
Seoul Univ., Seoul
Volume :
2
fYear :
2007
fDate :
19-22 Aug. 2007
Firstpage :
1137
Lastpage :
1143
Abstract :
In the real-world operational environment, text classification systems should handle the problem of incomplete training set and no prior knowledge of feature space. In this regard, the most appropriate algorithm for operational text classification is the naive Bayes since it is easy to incrementally update its pre-learned classification model and feature space. Our work mainly focuses on improving naive Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of naive Bayes can consider the degree of feature importance as well as feature distribution. In addition, we have extended a conventional algorithm for incremental feature update for developing a dynamic feature space in operational environment. Through experiments using the Reuters-21578 and the 20 Newsgroup benchmark collections, we show that the traditional multinomial naive Bayes classifier can be significantly improved by chi2-statistic based feature weighting.
Keywords :
Bayes methods; classification; feature extraction; learning (artificial intelligence); text analysis; dynamic feature space; incomplete training set; incremental feature weighting; naive Bayes text classification systems; operational environment; parameter estimation; pre-learned classification model; Cybernetics; Electronic mail; IP networks; Knowledge engineering; Machine learning; Parameter estimation; Software libraries; Statistics; Text categorization; Web pages; Feature selection; Feature weighting; Naïve Bayes classifier; Text classification; ¿2-statistic;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
978-1-4244-0973-0
Electronic_ISBN :
978-1-4244-0973-0
Type :
conf
DOI :
10.1109/ICMLC.2007.4370315
Filename :
4370315
Link To Document :
بازگشت