• DocumentCode
    560168
  • Title

    Highly scalable ab initio genomic motif identification

  • Author

    Marchand, Benoît ; Bajic, Vladimir B. ; Kaushik, Dinesh K.

  • Author_Institution
    Comput. Biosci. Res. Center (CBRC), King Abdullah Univ. of Sci. & Technol., Thuwal, Saudi Arabia
  • fYear
    2011
  • fDate
    12-18 Nov. 2011
  • Firstpage
    1
  • Lastpage
    10
  • Abstract
    We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motif-finding simulations in a few hours while the original serial code would have needed decades of execution time.
  • Keywords
    ab initio calculations; application program interfaces; genomics; message passing; optimisation; parallel programming; ab initio motif family identification system; ab initio motif-finding simulation; dragon motif finder; genomic sequences; master-slave work assignment; mixed-mode MPI-OpenMP parallel programming; motif-finding algorithm; multilevel MPI collectives; multilevel workload distribution; processor cores; scalability issue; scalable ab initio genomic motif identification; serial code; serial optimization; similar polynucleotide pattern; Genomics; Instruction sets; Master-slave; Message systems; Optimization; Parallel processing; Scalability; Mixed-mode MPI-OpenMP parallel processing; data-flow parallel processing; master-slave MPI parallel processing; multi-level MPI collective operations; multi-level workload distribution;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for
  • Conference_Location
    Seatle, WA
  • Electronic_ISBN
    978-1-4503-0771-0
  • Type

    conf

  • Filename
    6114434