• DocumentCode
    2450905
  • Title

    pFANGS: Parallel high speed sequence mapping for Next Generation 454-roche Sequencing reads

  • Author

    Misra, Sanchit ; Narayanan, Ramanathan ; Liao, Wei-keng ; Choudhary, Alok ; Lin, Simon

  • Author_Institution
    Electr. Eng. & Comput. Sci., Northwestern Univ., Evanston, IL, USA
  • fYear
    2010
  • fDate
    19-23 April 2010
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Millions of DNA sequences (reads) are generated by Next Generation Sequencing machines everyday. There is a need for high performance algorithms to map these sequences to the reference genome to identify single nucleotide polymorphisms or rare transcripts to fulfill the dream of personalized medicine. In this paper, we present a high-throughput parallel sequence mapping program pFANGS. pFANGS is designed to find all the matches of a query sequence in the reference genome tolerating a large number of mismatches or insertions/deletions. pFANGS partitions the computational workload and data among all the processes and employs load-balancing mechanisms to ensure better process efficiency. Our experiments show that, with 512 processors, we are able to map approximately 31 million 454/Roche queries of length 500 each to a reference human genome per hour allowing 5 mismatches or insertion/deletions at full sensitivity. We also report and compare the performance results of two alternative parallel implementations of pFANGS: a shared memory OpenMP implementation and a MPI-OpenMP hybrid implementation.
  • Keywords
    DNA; biology computing; genomics; message passing; parallel programming; resource allocation; shared memory systems; DNA sequence; MPI-OpenMP hybrid implementation; Roche query; high performance algorithm; high-throughput parallel sequence mapping program; human genome; load-balancing; next generation 454-Roche sequencing reads; next generation sequencing machine; nucleotide polymorphism; pFANGS program; parallel computing; parallel high speed sequence mapping; personalized medicine; query sequence; shared memory OpenMP implementation; Algorithm design and analysis; Bioinformatics; Computer science; Costs; DNA; Databases; Genomics; Humans; Sequences; Throughput; 454 sequencers; next generation sequencers; parallel computing; sequence mapping;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on
  • Conference_Location
    Atlanta, GA
  • Print_ISBN
    978-1-4244-6533-0
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2010.5470894
  • Filename
    5470894