DocumentCode :
1789054
Title :
Prediction of diseases by cascading clustering and classification
Author :
Sumana, B.V. ; Santhanam, T.
Author_Institution :
Dept. of Comput. Sci., Vijaya Coll. Jayanagar, Bangalore, India
fYear :
2014
fDate :
10-11 Oct. 2014
Firstpage :
1
Lastpage :
8
Abstract :
Diagnosis of the disease is one of the application areas where data mining techniques helps in the extraction of knowledge from medical database. Recently, researchers have been investigating the effect of cascading more than one technique showing enhanced results in the diagnosis of the disease. This paper proposes a hybrid model using K-means as a preprocessing algorithm. The proposed model is developed in four stages. In the initial stage, datasets selected from the UCI repository is cleaned by deleting all the instances with missing values. In the second stage Best First search algorithm and Correlation based feature selection (CFS) are used in a cascaded fashion for relevant feature selection In the third stage the resultant dataset (binary class datasets) is then clustered into two segments using K-means and incorrectly clustered samples are eliminated to get final samples. Finally, the correctly clustered samples from the previous stage is trained with 12 different classifiers to build the final classifier model, using Stratified 10 fold cross validation. Experimental results proved that cascaded K-means clustering and classification with CFS and Best First as a Feature selection method showed enhanced classification accuracy on an average of 95% and above on 5 different medical datasets with all 12 classifiers.
Keywords :
data mining; diseases; medical diagnostic computing; medical information systems; pattern clustering; search problems; CFS; K-means model; Stratified 10 fold cross validation; UCI repository; best first search algorithm; cascaded K-means clustering; classification; correlation based feature selection; data mining; disease diagnosis; diseases prediction; knowledge extraction; medical database; Accuracy; Classification algorithms; Data mining; Data models; Diseases; Heart; Medical diagnostic imaging; Classification; Clustering; Correlation based feature selection (CFS); K-means; hybrid;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in Electronics, Computers and Communications (ICAECC), 2014 International Conference on
Conference_Location :
Bangalore
Type :
conf
DOI :
10.1109/ICAECC.2014.7002426
Filename :
7002426
Link To Document :
بازگشت