• DocumentCode
    1790241
  • Title

    Developing disease risk prediction model based on environmental factors

  • Author

    Mingyu Pak ; Miyoung Shin

  • Author_Institution
    Sch. of Electron. Eng., Kyungpook Nat. Univ., Daegu, South Korea
  • fYear
    2014
  • fDate
    22-25 June 2014
  • Firstpage
    1
  • Lastpage
    2
  • Abstract
    Analyzing the effects of various environmental factors on human diseases is one of the important issues in recent bioinformatics studies. In this paper we investigate several environmental factors regarding Type-2 diabetes and select some of them for develop an analytical model of disease risk prediction. For the selection of significant factors, we first preprocessed all the environmental factors into categorical values and then calculated the max/min odds ratios of all the categorized environmental factors. After that, we chose the top-n ranked factors as input features for the prediction model. The disease risk prediction model was developed with SVM classifiers, where training data were built based on Ansan/Ansung Cohort 2 Data obtained from the Korean National Institute of Health (KNIH). Here the data imbalanced problem was occurred in training data, which can be often observed in reality. Thus, to handle this problem, we regenerated the training data by using the SMOTE approach and used them for disease risk prediction modeling. For model evaluation, the proposed method was employed to predict the risk of Type-2 diabetes disease. The experiment results showed that our SVM classifiers based on selective environmental factors could produce very comparable results to the prediction model with genetic factors in forecasting the risk of specific disease.
  • Keywords
    bioinformatics; diseases; environmental factors; genetics; prediction theory; risk analysis; support vector machines; Ansan/Ansung Cohort 2 Data; KNIH; Korean National Institute of Health; SMOTE approach; SVM classifiers; Type-2 diabetes; bioinformatics studies; categorical values; data imbalanced problem; disease risk prediction model; environmental factors; genetic factors; human diseases; max/min odds ratios; top-n ranked factors; training data regeneration; Data models; Diabetes; Diseases; Environmental factors; Genetics; Predictive models; Support vector machines; Environmental-wide association study; SVM classifiers; disease risk prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Consumer Electronics (ISCE 2014), The 18th IEEE International Symposium on
  • Conference_Location
    JeJu Island
  • Type

    conf

  • DOI
    10.1109/ISCE.2014.6884338
  • Filename
    6884338