Title of article :
A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance
Author/Authors :
Sundarkumar، نويسنده , , G. Ganesh and Ravi، نويسنده , , Vadlamani، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2015
Pages :
10
From page :
368
To page :
377
Abstract :
In this paper, we propose a novel hybrid approach for rectifying the data imbalance problem by employing k Reverse Nearest Neighborhood and One Class support vector machine (OCSVM) in tandem. We mined an Automobile Insurance Fraud detection dataset and customer Credit Card Churn prediction dataset to demonstrate the effectiveness of the proposed model. Throughout the paper, we followed 10 fold cross validation method of testing using Decision Tree (DT), Support Vector Machine (SVM), Logistic Regression (LR), Probabilistic Neural Network (PNN), Group Method of Data Handling (GMDH), Multi-Layer Perceptron (MLP). We observed that DT and SVM respectively yielded high sensitivity of 90.74% and 91.89% on Insurance dataset and DT, SVM and GMDH respectively produced high sensitivity of 91.2%, 87.7%, and 83.1% on Credit Card Churn Prediction dataset. In the case of Insurance Fraud detection dataset, we found that statistically there is no significant difference between DT (J48) and SVM. As DT yields “if then” rules, we prefer DT over SVM. Further, in the case of churn prediction dataset, it turned out that GMDH, SVM and LR are not statistically different and GMDH yielded very high Area Under Curve at ROC. Further, DT yielded just 4 ‘if–then’ rules on Insurance and 10 rules on churn prediction datasets, which is the significant outcome of the study.
Keywords :
insurance fraud detection , Credit card churn prediction , undersampling , K- Reverse Nearest Neighbourhood method , One-class support vector machine
Journal title :
Engineering Applications of Artificial Intelligence
Serial Year :
2015
Journal title :
Engineering Applications of Artificial Intelligence
Record number :
2126364
Link To Document :
بازگشت