• DocumentCode
    2530912
  • Title

    SAIL-APPROX: An Efficient On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints

  • Author

    He, Dan ; Wu, Xindong ; Zhu, Xingquan

  • Author_Institution
    Univ. of Vermont, Burlington
  • fYear
    2007
  • fDate
    2-4 Nov. 2007
  • Firstpage
    151
  • Lastpage
    158
  • Abstract
    Finding patterns in biological sequences owns a significant impact on many real-world applications such as biological sequence analysis, text indexing, stream data mining, and sensor networking. The problem of Pattern Matching with Wildcards and Length Constraints is to find all locations of occurrences of a pattern P in a text T, which can be a biological sequence, text string, etc. The user can specify a varying range for the number of wildcards between every two consecutive letters in P and also the length constraints of P. Another constraint is the one-off condition, where every literal in T can only be used once for matching with P. The on-line version of this problem is to find out an occurrence of the given pattern that satisfies all constraints as soon as the occurrence appears in the input of T so far. There is an algorithm SAIL to find the optimal solution for the on-line version of this problem. However, SAIL only handles exact pattern matching. In this paper, we propose an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints, which is a more general problem than exact matching. We apply dynamic programming in our algorithm and prove that our algorithm is correct.
  • Keywords
    biocybernetics; biology computing; dynamic programming; pattern matching; SAIL-APPROX; approximate pattern matching algorithm; biological sequence pattern finding; dynamic programming; length constraint; one-off condition; online algorithm; pattern matching problem; wildcard; Bioinformatics; Biology; Biosensors; Computer science; Data mining; Indexing; Pattern analysis; Pattern matching; Sequences; USA Councils;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on
  • Conference_Location
    Fremont, CA
  • Print_ISBN
    978-0-7695-3031-4
  • Type

    conf

  • DOI
    10.1109/BIBM.2007.48
  • Filename
    4413049