Title :
String matching with stopper compression
Author :
Rautio, Jussi ; Tanninen, Jani ; Tarhio, Jorma
Author_Institution :
Lab. of Inf. Process. Sci., Helsinki Univ. of Technol., Finland
fDate :
6/24/1905 12:00:00 AM
Abstract :
Summary form only given. We consider string searching in compressed texts. We utilize a compression method related to static Huffman compression. Characters are encoded as variable length sequences of base symbols, which consist of a fixed number of bits. Because the length of a code as base symbols varies, we divide base symbols into stoppers and continuers in order to be able to recognize where a new code starts. Stoppers can only be used as the last base symbol of a code. All other base symbols are continuers which can be used anywhere but as the last base symbol of a code. Our searching algorithm is a variation of the Boyer-Moore-Horspool algorithm. The shift function is based on several base symbols in order to achieve longer jumps than the ordinary occurrence heuristic. If four bits are used for base symbols, we apply bytes of eight bits for shift calculation.
Keywords :
Huffman codes; binary sequences; data compression; search problems; string matching; text analysis; variable length codes; Boyer-Moore-Horspool algorithm; base symbols; compressed texts; continuers; shift function; static Huffman compression; stopper compression; string matching; string searching; variable length code; variable length sequences; Bismuth; Chromium; Data compression;
Conference_Titel :
Data Compression Conference, 2002. Proceedings. DCC 2002
Print_ISBN :
0-7695-1477-4
DOI :
10.1109/DCC.2002.1000012