شماره ركورد كنفرانس :
3752
عنوان مقاله :
Comparing classification algorithms of data mining in diagnosis of diabetes and assessing the effectiveness of k-fold cross validation in the accuracy of the constructed model
عنوان به زبان ديگر :
Comparing classification algorithms of data mining in diagnosis of diabetes and assessing the effectiveness of k-fold cross validation in the accuracy of the constructed model
پديدآورندگان :
Nikbakhsh Nasim Nasim.nikbakhsh@gmail.com Department of computer, Isfahan (khorasgan) Branch Islamic Azad Univercity Isfahan, Iran , Dehghani GholamReza dehghani_ghr@yahoo.com Department of computer, Isfahan (khorasgan) Branch Islamic Azad Univercity Isfahan, Iran , Zamani Farsad farsad.zamani@yahoo.com Department of computer, Isfahan (khorasgan) Branch Islamic Azad Univercity Isfahan, Iran
تعداد صفحه :
6
كليدواژه :
data mining , classification , KNN , SVM , Nave Bayesian , Decision Tree , k , fold validation ,
سال انتشار :
1395
عنوان كنفرانس :
اولين كنفرانس بين المللي مهندسي و علوم كامپيوتر
زبان مدرك :
انگليسي
چكيده فارسي :
One of the applications of data mining is in medicine and model construction for disease diagnosis. The more the model learns from previous data, the more accurate it would perform. The essential issue is that, the training and testing data in classification of data must be selected in a way that the model enjoys the most efficient learning from previous data and the highest accuracy in diagnosis of the disease. In this study, the Pima dataset of diabetics is applied, the models for predicting and diagnosing diabetes are developed based on KNN, SVM, Nave Bayesian and Decision Tree classification methods and the accuracy of each model is evaluated. The effectiveness of k-fold validation on the accuracy of each model is assessed. According to the findings here, k-fold cross validation increases the model accuracy and a classification technique would not always have the best performance and accuracy, while it depends on the nature and complexity of the dataset. The simulation is made by the tool named RapidMiner.
چكيده لاتين :
One of the applications of data mining is in medicine and model construction for disease diagnosis. The more the model learns from previous data, the more accurate it would perform. The essential issue is that, the training and testing data in classification of data must be selected in a way that the model enjoys the most efficient learning from previous data and the highest accuracy in diagnosis of the disease. In this study, the Pima dataset of diabetics is applied, the models for predicting and diagnosing diabetes are developed based on KNN, SVM, Nave Bayesian and Decision Tree classification methods and the accuracy of each model is evaluated. The effectiveness of k-fold validation on the accuracy of each model is assessed. According to the findings here, k-fold cross validation increases the model accuracy and a classification technique would not always have the best performance and accuracy, while it depends on the nature and complexity of the dataset. The simulation is made by the tool named RapidMiner.
كشور :
ايران
لينک به اين مدرک :
بازگشت