Title :
Context tables: a tool for describing text compression algorithms
Author :
Yokoo, Hidetoshi
Author_Institution :
Dept. of Comput. Sci., Gunma Univ., Japan
fDate :
30 Mar-1 Apr 1998
Abstract :
This paper introduces the notion of a context table, which is a common basis for describing and analyzing text compression algorithms. A context table stores all substrings in a text as their lexicographic orders. Examples of compression algorithms described in terms of context table concepts include LZ77 and the block-sorting algorithm. Since these algorithms are designed to work with an arbitrary source distribution, they can be expected to serve as an entropy estimator. A primal use of a context table is to reveal the capability of estimating the entropy. A context table makes it easy to understand several characteristic quantities including the recurrence time of a substring, the conditional recurrence time, and the length of the shortest unique substring. With the help of these concepts, some relations among apparently independent algorithms are established
Keywords :
data compression; entropy codes; sorting; source coding; word processing; LZ77; block sorting; block-sorting algorithm; conditional recurrence time; context sorting; context table; entropy estimator; lexicographic orders; recurrence time; shortest unique substring; source distribution; substrings; text compression algorithms; universal source code; Compression algorithms;
Conference_Titel :
Data Compression Conference, 1998. DCC '98. Proceedings
Conference_Location :
Snowbird, UT
Print_ISBN :
0-8186-8406-2
DOI :
10.1109/DCC.1998.672158