• DocumentCode
    228701
  • Title

    Parallel Bayesian Network Structure Learning for Genome-Scale Gene Networks

  • Author

    Misra, Sudip ; Vasimuddin, Md ; Pamnany, Kiran ; Chockalingam, Sriram P. ; Yong Dong ; Min Xie ; Aluru, Maneesha R. ; Aluru, Srinivas

  • Author_Institution
    Parallel Comput. Lab., Intel Corp., Bangalore, India
  • fYear
    2014
  • fDate
    16-21 Nov. 2014
  • Firstpage
    461
  • Lastpage
    472
  • Abstract
    Learning Bayesian networks is NP-hard. Even with recent progress in heuristic and parallel algorithms, modeling capabilities still fall short of the scale of the problems encountered. In this paper, we present a massively parallel method for Bayesian network structure learning, and demonstrate its capability by constructing genome-scale gene networks of the model plant Arabidopsis thaliana from over 168.5 million gene expression values. We report strong scaling efficiency of 75% and demonstrate scaling to 1.57 million cores of the Tianhe-2 supercomputer. Our results constitute three and five orders of magnitude increase over previously published results in the scale of data analyzed and computations performed, respectively. We achieve this through algorithmic innovations, using efficient techniques to distribute work across all compute nodes, all available processors and coprocessors on each node, all available threads on each processor and coprocessor, and vectorization techniques to maximize single thread performance.
  • Keywords
    belief networks; biology computing; genetic algorithms; genomics; learning (artificial intelligence); parallel algorithms; NP-hard; Tianhe-2 supercomputer; algorithmic innovation; gene expression value; genome-scale gene networks; heuristic algorithm; learning Bayesian networks; model plant Arabidopsis thaliana; modeling capability; parallel Bayesian network structure learning; parallel algorithm; scaling efficiency; single thread performance; vectorization technique; Bayes methods; Bioinformatics; Coprocessors; Genomics; Hypercubes; Instruction sets; Vectors; Bayesian networks; gene networks; parallel machine learning; systems biology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for
  • Conference_Location
    New Orleans, LA
  • Print_ISBN
    978-1-4799-5499-5
  • Type

    conf

  • DOI
    10.1109/SC.2014.43
  • Filename
    7013025