• DocumentCode
    2998442
  • Title

    Evaluation of GPU-based Seed Generation for Computational Genomics Using Burrows-Wheeler Transform

  • Author

    Liu, Yongchao ; Schmidt, Bertil

  • Author_Institution
    Inst. fur Inf., Johannes Gutenberg Univ. Mainz, Mainz, Germany
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    684
  • Lastpage
    690
  • Abstract
    Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between the input reads and the reference genome, where hash tables are the most frequently used data structure. However, hash tables are memory-consuming, making it not well-suited to memory-stringent many-core architectures, like GPUs, even though they usually have a nearly constant query time complexity. The Burrows-Wheeler transform (BWT) provides a memory-efficient alternative, which has the drawback of having query time complexity as a function of query length. In this paper, we investigate GPU-based fixed-length seed generation for computational genomics based on the BWT and Ferragina Manzini (FM)-index, where k-mers from the reads are searched against a reference genome (indexed using BWT) to find k-mer matches (i.e. seeds). In addition to exact matches, mismatches are allowed at any position within a seed, different from spaced seeds that only allow mismatches at predefined positions. By evaluating the relative performance of our GPU version to an equivalent CPU version, we intend to provide some useful guidance for the development of GPU-based seed generators for aligners based on the seed-and-extend paradigm.
  • Keywords
    biology computing; computational complexity; file organisation; genomics; graphics processing units; multiprocessing systems; Burrows-Wheeler transform; CPU-based short read aligners; Ferragina Manzini index; GPU-based seed generation; computational genomics; data structure; hash tables; high-throughput sequencers; memory-stringent many-core architectures; query time complexity; seed-and-extend heuristic; Arrays; Equations; Genomics; Graphics processing unit; Kernel; Memory management; Runtime; Burrows-Wheeler transform; CUDA; GPU; Seed generation; seed-and-extend;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.85
  • Filename
    6270707