Title of article :
Evaluating the High Risk Groups for Suicide: A Comparison of Logistic Regression, Support Vector Machine, Decision Tree and Artificial Neural Network
Author/Authors :
AMINI, Payam Dept. of Epidemiology & Reproductive Health - Reproductive Epidemiology Research Center - Royan Institute for Reproductive Biomedicine - ACECR, Tehran , AHMADINIA, Hasan Dept. of Biostatistics & Epidemiology - Hamadan University of Medical Sciences, Hamadan , POOROLAJAL, Jalal Research Center for Health Sciences and Dept. of Biostatistics & Epidemiology - Hamadan University of Medical Sciences, Hamadan , MOQADDASI AMIRI, Mohammad Dept. of Biostatistics & Epidemiology - Hamadan University of Medical Sciences, Hamadan
Abstract :
Background: We aimed to assess the high-risk group for suicide using different classification methods includinglogis-tic regression (LR), decision tree (DT), artificial neural network (ANN), and support vector machine (SVM).
Methods: We used the dataset of a study conducted to predict risk factors of completed suicide in Hamadan Prov-ince, the west of Iran, in 2010. To evaluate the high-risk groups for suicide, LR, SVM, DT and ANNwere performed. The applied methods were compared using sensitivity, specificity, positive predicted value, negative predicted value, accuracy and the area under curve. Cochran-Q test was implied to check differences in proportion among methods. To assess the association between the observed and predicted values, Ø coefficient, contingency coefficient, and Ken-dall tau-b were calculated.
Results: Gender, age, and job were the most important risk factors for fatal suicide attempts in common for four me-thods. SVM method showed the highest accuracy 0.68 and 0.67 for training and testing sample, respectively. However, this method resulted in the highest specificity (0.67 for training and 0.68 for testing sample) and the highest sensitivity for training sample (0.85), but the lowest sensitivity for the testing sample (0.53). Cochran-Q test resulted in differenc-es between proportions in different methods (P<0.001). The association of SVM predictions and observed values, Ø coefficient, contingency coefficient, and Kendall tau-b were 0.239, 0.232 and 0.239, respectively.
Conclusion: SVM had the best performance to classify fatal suicide attempts comparing to DT, LR and ANN.
Keywords :
Suicide , Support vector machine , Neuralnetworks , Logistic regression , Decision tree , Classification