DocumentCode :
3225263
Title :
Compressed Index for Dictionary Matching
Author :
Hon, Wing-Kai ; Shah, Rahul ; Vitter, Jeffrey Scott ; Lam, Tak-Wah ; Siu-Lung Tarn
Author_Institution :
Nat. Tsing Hua Univ., Hsinchu
fYear :
2008
fDate :
25-27 March 2008
Firstpage :
23
Lastpage :
32
Abstract :
The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to |T| Hk (T) + o (|T| log sigma) bits, where Hk(T) denotes the kth-order empirical entropy of T, and sigma is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (n Hk (D) + o(n log sigma))-bit index with only a small sacrifice in search time.
Keywords :
computational complexity; data compression; indexing; string matching; compressed index; dictionary matching; pattern matching; sampling technique; string indexing; Bioinformatics; Data compression; Databases; Dictionaries; Entropy; Genomics; Humans; Indexing; Pattern matching; Sampling methods; Compression; Dictionary Matching; Entropy; Indexing; Pattern Matching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2008. DCC 2008
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-0-7695-3121-2
Type :
conf
DOI :
10.1109/DCC.2008.62
Filename :
4483280
Link To Document :
بازگشت