• DocumentCode
    2989935
  • Title

    High performance implementation of planted motif problem using suffix trees

  • Author

    Dasari, Naga Shailaja ; Ranjan, Desh ; Zubair, Mohammad

  • Author_Institution
    Old Dominion Univ., Norfolk, VA, USA
  • fYear
    2011
  • fDate
    4-8 July 2011
  • Firstpage
    200
  • Lastpage
    206
  • Abstract
    In this paper we present a high performance implementation of suffix tree based solution to the planted motif problem on two different parallel architectures: NVIDIA GPU and Intel Multicore machines. An (I,d) planted motif problem(PMP) is defined as: Given a sequence of n DNA sequences, each of length L, find M, the set of sequences (or motifs) of length I which have at least one d-neighbor in each of the n sequences. Here, a d-neighbor of a sequence is a sequence of same length that differs in at-most d positions. PMP is a well studied problem in computational biology. It is useful in developing methods for finding transcription factor binding sites, sequence classification and for building phylogenetic trees. The problem is computationally challenging to solve, for example a (19,7) PMP takes 9.9 hours on a sequential machine. Many approaches to solve planted motif problem can be found in literature. One approach is based on use of suffix tree data structure. Though suffix tree based methods are the most efficient ones for solving large planted motif problems on sequential machines, they are quite difficult to parallelize. We present suffix tree based parallel solutions for PMP on NVIDIA GPU and Intel Multicore architectures that are efficient and scalable. The solutions are based on a suffix tree algorithm previously presented but use extensive adaptation to individual architectures to ensure that the implementations work efficiently and scale well.
  • Keywords
    DNA; biology computing; computer graphic equipment; coprocessors; genetics; multiprocessing systems; parallel architectures; tree data structures; DNA sequences; Intel multicore machines; NVIDIA GPU; computational biology; parallel architecture; phylogenetic trees; planted motif problem; sequence classification; sequential machines; suffix tree data structure; time 9.9 hr; transcription factor binding sites; Algorithm design and analysis; Graphics processing unit; Instruction sets; Multicore processing; Random access memory; BitBased; DNA; PMP; gSPELLER; mSPELLER; multicore; parallel;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing and Simulation (HPCS), 2011 International Conference on
  • Conference_Location
    Istanbul
  • Print_ISBN
    978-1-61284-380-3
  • Type

    conf

  • DOI
    10.1109/HPCSim.2011.5999825
  • Filename
    5999825