Title :
Matching a Set of Patterns with Wildcards
Author :
Zhang, Meng ; Zhang, Yi ; Tang, Jijun
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
Abstract :
Multi-pattern matching with wildcards is to find all the occurrences of a set of patterns with wildcards in a text. This problem arises in various fields, such as computational biology and network security. But the problem is not extensively studied as the single pattern case and there is no efficient algorithm for this problem. In this paper, we present efficient algorithms based on fast Fourier transforms. Let P = {p1, . . . , pk} be a set of patterns with wildcards where the total length of patterns is |P|, and a text t of length n over alphabet a1 , . . . ,aσ. We present two algorithms for this problem where patterns are matched simultaneously. The first algorithm finds the matches of a small set of patterns in the text in O(n log |P| + nk) time. The words used in the algorithm are of size k[2 lg σ] + Xi=1k ⌈Ig |pi|⌉ bits. The second one finds the matchings of patterns in the text in time O(n log |P| log σ + nk) by computing the Hamming distance between the patterns and the text. The algorithm uses the words with Xi=1k ⌈lg |pi| bits. We also demonstrate an FFT implementation based on the modular arithmetic for machines with word size of 64 bits. Finally, we show that both algorithms can be easily parallelized and the parallelized algorithms are given as well.
Keywords :
Hamming codes; fast Fourier transforms; set theory; string matching; text analysis; Hamming distance; fast Fourier transform; pattern matching; set of pattern; wildcard; Arrays; Computer science; Convolution; Electronic mail; Hamming distance; Pattern matching; Program processors; Algorithm; FFT; Multi-pattern matching; Wildcards;
Conference_Titel :
Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
Conference_Location :
Dalian
Print_ISBN :
978-1-4244-9482-8
DOI :
10.1109/PAAP.2010.70