• DocumentCode
    1762029
  • Title

    An Efficient Exact Algorithm for the Motif Stem Search Problem over Large Alphabets

  • Author

    Qiang Yu ; Hongwei Huo ; Vitter, Jeffrey Scott ; Jun Huan ; Nekrich, Yakov

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´an, China
  • Volume
    12
  • Issue
    2
  • fYear
    2015
  • fDate
    March-April 2015
  • Firstpage
    384
  • Lastpage
    397
  • Abstract
    In recent years, there has been an increasing interest in planted (l, d) motif search (PMS) with applications to discovering significant segments in biological sequences. However, there has been little discussion about PMS over large alphabets. This paper focuses on motif stem search (MSS), which is recently introduced to search motifs on large-alphabet inputs. A motif stem is an l-length string with some wildcards. The goal of the MSS problem is to find a set of stems that represents a superset of all (l , d) motifs present in the input sequences, and the superset is expected to be as small as possible. The three main contributions of this paper are as follows: (1) We build motif stem representation more precisely by using regular expressions. (2) We give a method for generating all possible motif stems without redundant wildcards. (3) We propose an efficient exact algorithm, called StemFinder, for solving the MSS problem. Compared with the previous MSS algorithms, StemFinder runs much faster and reports fewer stems which represent a smaller superset of all (l, d) motifs. StemFinder is freely available at http://sites.google.com/site/feqond/stemfinder.
  • Keywords
    biochemistry; molecular biophysics; proteins; statistical analysis; MSS problem; biological sequences; efficient exact algorithm; input sequences; l-length string; large alphabets; large-alphabet inputs; motif stem representation; motif stem search problem; planted motif search; redundant wildcards; stem finder; Algorithm design and analysis; Bioinformatics; Computational biology; Hamming distance; IEEE transactions; Radiation detectors; Silicon; Motif stem search; pattern driven; regular expressions;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2361668
  • Filename
    6917035