DocumentCode :
2533057
Title :
Matching a Set of Patterns with Wildcards
Author :
Zhang, Meng ; Zhang, Yi ; Tang, Jijun
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
fYear :
2010
fDate :
18-20 Dec. 2010
Firstpage :
169
Lastpage :
174
Abstract :
Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on fast Fourier transforms. Let P = {p1, . . . , pk} be a set of patterns with wildcards where the total length of patterns is |P|, and a text t of length n over alphabet a1 , . . . ,aσ. We present two algorithms for this problem where patterns are matched simultaneously. The first algorithm finds the matches of a small set of patterns in the text in O(n log |P| + nk) time. The words used in the algorithm are of size k[2 lg σ] + Xi=1k ⌈Ig |pi|⌉ bits. The second one finds the matchings of patterns in the text in time O(n log |P| log σ + nk) by computing the Hamming distance between the patterns and the text. The algorithm uses the words with Xi=1k ⌈lg |pi| bits. We also demonstrate an FFT implementation based on the modular arithmetic for machines with word size of 64 bits. Finally, we show that both algorithms can be easily parallelized and the parallelized algorithms are given as well.
Keywords :
Hamming codes; fast Fourier transforms; set theory; string matching; text analysis; Hamming distance; fast Fourier transform; pattern matching; set of pattern; wildcard; Arrays; Computer science; Convolution; Electronic mail; Hamming distance; Pattern matching; Program processors; Algorithm; FFT; Multi-pattern matching; Wildcards;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-9482-8
Type :
conf
DOI :
10.1109/PAAP.2010.70
Filename :
5715080
Link To Document :
بازگشت