DocumentCode :
1560563
Title :
String matching with stopper compression
Author :
Rautio, Jussi ; Tanninen, Jani ; Tarhio, Jorma
Author_Institution :
Lab. of Inf. Process. Sci., Helsinki Univ. of Technol., Finland
fYear :
2002
fDate :
6/24/1905 12:00:00 AM
Firstpage :
469
Abstract :
Summary form only given. We consider string searching in compressed texts. We utilize a compression method related to static Huffman compression. Characters are encoded as variable length sequences of base symbols, which consist of a fixed number of bits. Because the length of a code as base symbols varies, we divide base symbols into stoppers and continuers in order to be able to recognize where a new code starts. Stoppers can only be used as the last base symbol of a code. All other base symbols are continuers which can be used anywhere but as the last base symbol of a code. Our searching algorithm is a variation of the Boyer-Moore-Horspool algorithm. The shift function is based on several base symbols in order to achieve longer jumps than the ordinary occurrence heuristic. If four bits are used for base symbols, we apply bytes of eight bits for shift calculation.
Keywords :
Huffman codes; binary sequences; data compression; search problems; string matching; text analysis; variable length codes; Boyer-Moore-Horspool algorithm; base symbols; compressed texts; continuers; shift function; static Huffman compression; stopper compression; string matching; string searching; variable length code; variable length sequences; Bismuth; Chromium; Data compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference, 2002. Proceedings. DCC 2002
ISSN :
1068-0314
Print_ISBN :
0-7695-1477-4
Type :
conf
DOI :
10.1109/DCC.2002.1000012
Filename :
1000012
Link To Document :
بازگشت