Title :
Compressed Index for Dictionary Matching
Author :
Hon, Wing-Kai ; Shah, Rahul ; Vitter, Jeffrey Scott ; Lam, Tak-Wah ; Siu-Lung Tarn
Author_Institution :
Nat. Tsing Hua Univ., Hsinchu
Abstract :
The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to |T| Hk (T) + o (|T| log sigma) bits, where Hk(T) denotes the kth-order empirical entropy of T, and sigma is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (n Hk (D) + o(n log sigma))-bit index with only a small sacrifice in search time.
Keywords :
computational complexity; data compression; indexing; string matching; compressed index; dictionary matching; pattern matching; sampling technique; string indexing; Bioinformatics; Data compression; Databases; Dictionaries; Entropy; Genomics; Humans; Indexing; Pattern matching; Sampling methods; Compression; Dictionary Matching; Entropy; Indexing; Pattern Matching;
Conference_Titel :
Data Compression Conference, 2008. DCC 2008
Conference_Location :
Snowbird, UT
Print_ISBN :
978-0-7695-3121-2
DOI :
10.1109/DCC.2008.62