DocumentCode
2530912
Title
SAIL-APPROX: An Efficient On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints
Author
He, Dan ; Wu, Xindong ; Zhu, Xingquan
Author_Institution
Univ. of Vermont, Burlington
fYear
2007
fDate
2-4 Nov. 2007
Firstpage
151
Lastpage
158
Abstract
Finding patterns in biological sequences owns a significant impact on many real-world applications such as biological sequence analysis, text indexing, stream data mining, and sensor networking. The problem of Pattern Matching with Wildcards and Length Constraints is to find all locations of occurrences of a pattern P in a text T, which can be a biological sequence, text string, etc. The user can specify a varying range for the number of wildcards between every two consecutive letters in P and also the length constraints of P. Another constraint is the one-off condition, where every literal in T can only be used once for matching with P. The on-line version of this problem is to find out an occurrence of the given pattern that satisfies all constraints as soon as the occurrence appears in the input of T so far. There is an algorithm SAIL to find the optimal solution for the on-line version of this problem. However, SAIL only handles exact pattern matching. In this paper, we propose an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints, which is a more general problem than exact matching. We apply dynamic programming in our algorithm and prove that our algorithm is correct.
Keywords
biocybernetics; biology computing; dynamic programming; pattern matching; SAIL-APPROX; approximate pattern matching algorithm; biological sequence pattern finding; dynamic programming; length constraint; one-off condition; online algorithm; pattern matching problem; wildcard; Bioinformatics; Biology; Biosensors; Computer science; Data mining; Indexing; Pattern analysis; Pattern matching; Sequences; USA Councils;
fLanguage
English
Publisher
ieee
Conference_Titel
Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on
Conference_Location
Fremont, CA
Print_ISBN
978-0-7695-3031-4
Type
conf
DOI
10.1109/BIBM.2007.48
Filename
4413049
Link To Document