DocumentCode
1762029
Title
An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets
Author
Qiang Yu ; Hongwei Huo ; Vitter, Jeffrey Scott ; Jun Huan ; Nekrich, Yakov
Author_Institution
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
Volume
12
Issue
2
fYear
2015
fDate
March-April 2015
Firstpage
384
Lastpage
397
Abstract
In recent years, there has been an increasing interest in planted (l, d) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an l-length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all (l , d) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (l, d) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder.
Keywords
biochemistry; molecular biophysics; proteins; statistical analysis; MSS problem; biological sequences; efficient exact algorithm; input sequences; l-length string; large alphabets; large-alphabet inputs; motif stem representation; motif stem search problem; planted motif search; redundant wildcards; stem finder; Algorithm design and analysis; Bioinformatics; Computational biology; Hamming distance; IEEE transactions; Radiation detectors; Silicon; Motif stem search; pattern driven; regular expressions;
fLanguage
English
Journal_Title
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher
ieee
ISSN
1545-5963
Type
jour
DOI
10.1109/TCBB.2014.2361668
Filename
6917035
Link To Document