DocumentCode :
1790241
Title :
Developing disease risk prediction model based on environmental factors
Author :
Mingyu Pak ; Miyoung Shin
Author_Institution :
Sch. of Electron. Eng., Kyungpook Nat. Univ., Daegu, South Korea
fYear :
2014
fDate :
22-25 June 2014
Firstpage :
1
Lastpage :
2
Abstract :
Analyzing the effects of various environmental factors on human diseases is one of the important issues in recent bioinformatics studies. In this paper we investigate several environmental factors regarding Type-2 diabetes and select some of them for develop an analytical model of disease risk prediction. For the selection of significant factors, we first preprocessed all the environmental factors into categorical values and then calculated the max/min odds ratios of all the categorized environmental factors. After that, we chose the top-n ranked factors as input features for the prediction model. The disease risk prediction model was developed with SVM classifiers, where training data were built based on Ansan/Ansung Cohort 2 Data obtained from the Korean National Institute of Health (KNIH). Here the data imbalanced problem was occurred in training data, which can be often observed in reality. Thus, to handle this problem, we regenerated the training data by using the SMOTE approach and used them for disease risk prediction modeling. For model evaluation, the proposed method was employed to predict the risk of Type-2 diabetes disease. The experiment results showed that our SVM classifiers based on selective environmental factors could produce very comparable results to the prediction model with genetic factors in forecasting the risk of specific disease.
Keywords :
bioinformatics; diseases; environmental factors; genetics; prediction theory; risk analysis; support vector machines; Ansan/Ansung Cohort 2 Data; KNIH; Korean National Institute of Health; SMOTE approach; SVM classifiers; Type-2 diabetes; bioinformatics studies; categorical values; data imbalanced problem; disease risk prediction model; environmental factors; genetic factors; human diseases; max/min odds ratios; top-n ranked factors; training data regeneration; Data models; Diabetes; Diseases; Environmental factors; Genetics; Predictive models; Support vector machines; Environmental-wide association study; SVM classifiers; disease risk prediction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Consumer Electronics (ISCE 2014), The 18th IEEE International Symposium on
Conference_Location :
JeJu Island
Type :
conf
DOI :
10.1109/ISCE.2014.6884338
Filename :
6884338
Link To Document :
بازگشت