Title :
Fast Motif Selection for Biological Sequences
Author :
Kuksa, Pavel ; Pavlovic, Vladimir
Author_Institution :
Dept. of Comput. Sci., Rutgers Univ., Piscataway, NJ, USA
Abstract :
We consider the problem of identifying motifs, recurring or conserved patterns, in the sets of biological sequences. To solve this task, we present new deterministic and exact algorithms for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. The proposed algorithms (1) improve search efficiency compared to existing exact algorithms by focusing search on a selected set of potential motif instances, and (2) scale well with the input length and the size of alphabet. Our algorithms are orders of magnitude faster than existing exact algorithms for common pattern identification. We evaluate our algorithms on benchmark motif finding problems and real applications in biological sequence analysis and show that they exhibit significant running time improvements compared to the state-of-the-art approaches.
Keywords :
DNA; biology computing; genetics; genomics; molecular biophysics; proteomics; DNA sequences; benchmark motif; biological sequence analysis; common pattern identification; exact algorithms; fast motif selection; genomic analysis; potential motif instances; proteomic analysis; Algorithm design and analysis; Bioinformatics; Biology; Computer science; DNA; Hamming distance; Pattern analysis; Proteins; Sequences; Voting;
Conference_Titel :
Bioinformatics and Biomedicine, 2009. BIBM '09. IEEE International Conference on
Conference_Location :
Washington, DC
Print_ISBN :
978-0-7695-3885-3
DOI :
10.1109/BIBM.2009.41