• DocumentCode
    166687
  • Title

    UPC++ for bioinformatics: A case study using genome-wide association studies

  • Author

    Kassens, Jan C. ; Gonzalez-Dominguez, Jorge ; Wienbrandt, Lars ; Schmidt, Benedikt

  • Author_Institution
    Dept. of Comput. Sci., Christian-Albrechts-Univ. of Kiel, Kiel, Germany
  • fYear
    2014
  • fDate
    22-26 Sept. 2014
  • Firstpage
    248
  • Lastpage
    256
  • Abstract
    Modern genotyping technologies are able to obtain up to a few million genetic markers (such as SNPs) of an individual within a few minutes of time. Detecting epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important but time-consuming operation since statistical computations have to be performed for each pair of measured markers. Therefore, a variety of HPC architectures have been used to accelerate these studies. In this work we present a parallel approach for multi-core clusters, which is implemented with UPC++ and takes advantage of the features available in the Partitioned Global Address Space and Object Oriented Programming models. Our solution is based on a well-known regression model (used by the popular BOOST tool) to test SNP-pairs interactions. Experimental results show that UPC++ is suitable for parallelizing data-intensive bioinformatics applications on clusters. For instance, it reduces the time to analyze a real-world dataset with more than 500,000 SNPs and 5,000 individuals from several days when using a single core to less than one minute using 512 nodes (12,288 cores) of a Cray XC30 supercomputer.
  • Keywords
    C++ language; Cray computers; bioinformatics; genetics; genomics; multiprocessing systems; object-oriented programming; parallel architectures; regression analysis; BOOST tool; Cray XC30 supercomputer; HPC architectures; SNP-SNP interactions; SNP-pairs interactions; UPC++; data-intensive bioinformatics applications; epistasis detection; genetic markers; genome-wide association studies; modern genotyping technologies; multicore clusters; object oriented programming models; parallel approach; partitioned global address space; real-world dataset; regression model; Bioinformatics; Computational modeling; Diseases; Electronics packaging; Genetics; Object oriented modeling; Optimization; Bioinformatics; GWAS; PGAS; UPC++;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2014 IEEE International Conference on
  • Conference_Location
    Madrid
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2014.6968770
  • Filename
    6968770