Title :
The Effect of Features Using Feature Selection for Bayesian Classifier
Author :
Moore, L. ; Kambhampati, C.
Author_Institution :
Dept. of Comput. Sci., Univ. of Hull, Kingston upon Hull, UK
Abstract :
The characteristics of live clinical datasets comprise of: high dimensionality, missing data, class imbalance, non-normal distribution, noisy and inconsistency. This paper focuses on missing data and high dimensionality, both of which are known to affect classification and the design of decision support systems. Naive Bayes classifier was employed to explore how both complexities are handled. Experimental result showed that imputing missing values does not improve the performance of naive Bayes. However, a wrapper subset evaluator that employed forward and backward search strategies for reducing the data dimension had an effect on performance. The methods determined an optimal set of features and reduced numerical and time complexity while maintaining a high degree of accuracy. These methods were compared in two different datasets, a life lab and iris dataset.
Keywords :
Bayes methods; data mining; data reduction; decision support systems; feature selection; medical information systems; pattern classification; search problems; Bayesian classifier; Naive Bayes classifier; backward search strategy; class imbalance; data dimension reduction; dataset inconsistency; decision support systems; feature selection; forward search strategy; high dimensionality; iris dataset; lifelab dataset; live clinical dataset characteristics; missing data; noisy datasets; nonnormal distribution; wrapper subset evaluator; Bayes methods; Data mining; Feature extraction; Heart; Iris; Mathematical model; Search problems; Bayesian classifier; clinical dataset; data mining; feature selection; heart failure; high dimensionality; wrapper subset;
Conference_Titel :
Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on
Conference_Location :
Manchester
DOI :
10.1109/SMC.2013.790