Title :
SAIL-APPROX: An Efficient On-Line Algorithm for Approximate Pattern Matching with Wildcards and Length Constraints
Author :
He, Dan ; Wu, Xindong ; Zhu, Xingquan
Author_Institution :
Univ. of Vermont, Burlington
Abstract :
Finding patterns in biological sequences owns a significant impact on many real-world applications such as biological sequence analysis, text indexing, stream data mining, and sensor networking. The problem of Pattern Matching with Wildcards and Length Constraints is to find all locations of occurrences of a pattern P in a text T, which can be a biological sequence, text string, etc. The user can specify a varying range for the number of wildcards between every two consecutive letters in P and also the length constraints of P. Another constraint is the one-off condition, where every literal in T can only be used once for matching with P. The on-line version of this problem is to find out an occurrence of the given pattern that satisfies all constraints as soon as the occurrence appears in the input of T so far. There is an algorithm SAIL to find the optimal solution for the on-line version of this problem. However, SAIL only handles exact pattern matching. In this paper, we propose an efficient on-line algorithm for approximate pattern matching with wildcards and length constraints, which is a more general problem than exact matching. We apply dynamic programming in our algorithm and prove that our algorithm is correct.
Keywords :
biocybernetics; biology computing; dynamic programming; pattern matching; SAIL-APPROX; approximate pattern matching algorithm; biological sequence pattern finding; dynamic programming; length constraint; one-off condition; online algorithm; pattern matching problem; wildcard; Bioinformatics; Biology; Biosensors; Computer science; Data mining; Indexing; Pattern analysis; Pattern matching; Sequences; USA Councils;
Conference_Titel :
Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on
Conference_Location :
Fremont, CA
Print_ISBN :
978-0-7695-3031-4
DOI :
10.1109/BIBM.2007.48