DocumentCode
3225263
Title
Compressed Index for Dictionary Matching
Author
Hon, Wing-Kai ; Shah, Rahul ; Vitter, Jeffrey Scott ; Lam, Tak-Wah ; Siu-Lung Tarn
Author_Institution
Nat. Tsing Hua Univ., Hsinchu
fYear
2008
fDate
25-27 March 2008
Firstpage
23
Lastpage
32
Abstract
The past few years have witnessed several exciting results on compressed representation of a string T that supports efficient pattern matching, and the space complexity has been reduced to |T| Hk (T) + o (|T| log sigma) bits, where Hk(T) denotes the kth-order empirical entropy of T, and sigma is the size of the alphabet. In this paper we study compressed representation for another classical problem of string indexing, which is called dictionary matching in the literature. Precisely, a collection D of strings (called patterns) of total length n is to be indexed so that given a text T, the occurrences of the patterns in T can be found efficiently. In this paper we show how to exploit a sampling technique to compress the existing O(n)-word index to an (n Hk (D) + o(n log sigma))-bit index with only a small sacrifice in search time.
Keywords
computational complexity; data compression; indexing; string matching; compressed index; dictionary matching; pattern matching; sampling technique; string indexing; Bioinformatics; Data compression; Databases; Dictionaries; Entropy; Genomics; Humans; Indexing; Pattern matching; Sampling methods; Compression; Dictionary Matching; Entropy; Indexing; Pattern Matching;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference, 2008. DCC 2008
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
978-0-7695-3121-2
Type
conf
DOI
10.1109/DCC.2008.62
Filename
4483280
Link To Document