DocumentCode :
1762029
Title :
An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets
Author :
Qiang Yu ; Hongwei Huo ; Vitter, Jeffrey Scott ; Jun Huan ; Nekrich, Yakov
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
Volume :
12
Issue :
2
fYear :
2015
fDate :
March-April 2015
Firstpage :
384
Lastpage :
397
Abstract :
In recent years, there has been an increasing interest in planted (l, d) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an l-length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all (l , d) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (l, d) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder.
Keywords :
biochemistry; molecular biophysics; proteins; statistical analysis; MSS problem; biological sequences; efficient exact algorithm; input sequences; l-length string; large alphabets; large-alphabet inputs; motif stem representation; motif stem search problem; planted motif search; redundant wildcards; stem finder; Algorithm design and analysis; Bioinformatics; Computational biology; Hamming distance; IEEE transactions; Radiation detectors; Silicon; Motif stem search; pattern driven; regular expressions;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2014.2361668
Filename :
6917035
Link To Document :
بازگشت