• DocumentCode
    560425
  • Title

    Stratified Random Forest for Genome-wide Association Study

  • Author

    Wu, Qingyao ; Ye, Yunming ; Liu, Yang ; Ng, Michael

  • Author_Institution
    Dept. of Comput. Sci., Harbin Inst. of Technol., Shenzhen, China
  • fYear
    2011
  • fDate
    12-15 Nov. 2011
  • Firstpage
    10
  • Lastpage
    15
  • Abstract
    For high dimensional genome-wide association (GWA) case-control data of complex disease, there are usually a large portion of single-nucleotide polymorphisms (SNPs) that are irrelevant with the disease. A simple random sampling method in random forest using default mtry parameter to choose feature subspace, will select too many subspaces without informative SNPs. Exhaustive searching an optimal mtry is often required in order to include useful and relevant SNPs and get rid of vast of non-informative SNPs. However, it is very time-consuming and not favorable in GWA study for high- dimensional data. This paper proposes a stratified sampling method for feature subspace selection to generate decision trees in a random forest for GWA high-dimensional data. We employ two genome-wide SNP data sets (Parkinson case- control data comprised of 408,803 SNPs and Alzheimer case- control data comprised of 380,157 SNPs) to demonstrate that the proposed stratified sampling method is effective, and it can generate better random forest with higher accuracy and lower error bound than those by Breiman´s random forest generation method.
  • Keywords
    decision trees; diseases; genomics; random processes; sampling methods; complex disease; feature subspace selection; generate decision trees; genome-wide association; random sampling; single-nucleotide polymorphisms; stratified random forest; Bioinformatics; Correlation; Decision trees; Diseases; Radio frequency; Sampling methods; Vegetation; Genome-wide association study; random forest classifier; significant SNP selection; stratified sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4577-1799-4
  • Type

    conf

  • DOI
    10.1109/BIBM.2011.9
  • Filename
    6120401