Title :
Faster algorithms for string matching problems: matching the convolution bound
Author_Institution :
Dept. of Comput. Sci., Stanford Univ., CA, USA
Abstract :
In this paper we give a randomized O(nlogn)-time algorithm for the string matching with don´t cares problem. This improves the Fischer-Paterson bound from 1974 and answers the open problem posed (among others) by Weiner and Galil. Using the same technique, we give an O(nlogn)-time algorithm for other problems, including subset matching, tree pattern matching, (general) approximate threshold matching and point set matching. As this bound essentially matches the complexity of computing of the fast Fourier transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally the technique used for the threshold matching problem can be applied to the on-line version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the Karp-Rabin fingerprint method in which hash functions are locality-sensitive, i.e. the probability of collision of two words depends on the distance between them
Keywords :
computational complexity; fast Fourier transforms; randomised algorithms; string matching; Karp-Rabin fingerprint method; approximate threshold matching; complexity; convolution bound; don´t cares problem; fast Fourier transform; hash functions; point set matching; randomized algorithm; string matching problems; subset matching; threshold matching problem; tree pattern matching; Automata; Bridges; Communication switching; Convolution; Fast Fourier transforms; Hamming distance; Pattern matching; Sampling methods;
Conference_Titel :
Foundations of Computer Science, 1998. Proceedings. 39th Annual Symposium on
Conference_Location :
Palo Alto, CA
Print_ISBN :
0-8186-9172-7
DOI :
10.1109/SFCS.1998.743440