Title :
SVM ranking with backward search for feature selection in type II diabetes databases
Author :
Balakrishnan, Sarojini ; Narayanaswamy, Ramaraj ; Savarimuthu, Nickolas ; Samikannu, Rita
Author_Institution :
Dept. of Comput. Applic., K.L.N. Coll. of Inf. Technol., Madurai
Abstract :
Clinical databases have accumulated large quantities of information about patients and their clinical history. Data mining is the search for relationships and patterns within this data that could provide useful knowledge for effective decision-making. Classification analysis is one of the widely adopted data mining techniques for healthcare applications to support medical diagnosis, improving quality of patient care, etc. Usually medical databases are high dimensional in nature. If a training dataset contains irrelevant features (i.e., attributes), classification analysis may produce less accurate results. Data pre-processing is required to prepare the data for data mining and machine learning to increase the predictive accuracy. Feature selection is a preprocessing technique commonly used on high-dimensional data and its purposes include reducing dimensionality, removing irrelevant and redundant features, reducing the amount of data needed for learning, improving algorithms´ predictive accuracy, and increasing the constructed models´ comprehensibility. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. The importance of feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of features. Feature selection may provide us with the means to reduce the number of clinical measures made while still maintaining or even enhancing accuracy and reducing false negative rates. In medical diagnosis, reduction in false negative rate can, literally, be the difference between life and death. In this paper we propose a feature selection approach for finding an optimum feature subset that enhances the classification accuracy of Naive .Bayes classifier. Experiments were conducted on the Pima Indian Diabetes Dataset to assess the effectiveness of our approach. The results confirm that SVM Ra- - nking with Backward Search approach leads to promising improvement on feature selection and enhances classification accuracy.
Keywords :
Bayes methods; data mining; health care; learning (artificial intelligence); medical diagnostic computing; pattern classification; support vector machines; SVM ranking; classification analysis; clinical database; feature selection; healthcare application; machine learning; medical data mining; medical database; medical diagnosis; naive Bayes classifier; type II diabetes databases; Accuracy; Data mining; Decision making; Diabetes; History; Medical diagnosis; Medical diagnostic imaging; Spatial databases; Support vector machine classification; Support vector machines; Feature selection; SVM; backward search; classification accuracy; false negative rate; predictive accuracy;
Conference_Titel :
Systems, Man and Cybernetics, 2008. SMC 2008. IEEE International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-2383-5
Electronic_ISBN :
1062-922X
DOI :
10.1109/ICSMC.2008.4811692