Title :
Weighted naïve Bayes classifier on categorical features
Author :
Omura, K. ; Kudo, Motoi ; Endo, T. ; Murai, Takashi
Author_Institution :
Div. of Comput. Sci., Hokkaido Univ., Sapporo, Japan
Abstract :
Recently we face classification problems with many categorical features, as seen in genetic data and text data. In this paper, we discuss some ways to give weights on features in the framework of naïve Bayes classifier, that is, under independent assumption of features. Because no order exists in a categorical feature, we consider a histogram over possible values (bins) in the feature. Taking into the difference of number of samples falling in each bin, we propose two kinds of weights: 1) one is derived from the probability that the majority class takes the majority even in samples, and 2) another reflects the expected conditional entropy. With the latter entropy weight, it will be shown that more discriminative features gain higher weights and non-discriminative feature diminishes as the number of samples goes infinity. We reveal the properties of these two kinds of weights through artificial data and some real-life data.
Keywords :
belief networks; entropy; pattern classification; probability; artificial data; categorical features; discriminative features; entropy weight; expected conditional entropy; genetic data; histogram; nondiscriminative feature; probability; real-life data; text data; weighted naïve Bayes classifier; Accuracy; Entropy; Histograms; Intelligent systems; Reliability; Training; Training data; Categorical feature; Confidence weight; Entropy weight; Feature shrinkage; Naïve Bayes;
Conference_Titel :
Intelligent Systems Design and Applications (ISDA), 2012 12th International Conference on
Conference_Location :
Kochi
Print_ISBN :
978-1-4673-5117-1
DOI :
10.1109/ISDA.2012.6416651