Title :
Universal lossless data compression with side information by using a conditional MPM grammar transform
Author :
Yang, En-Hui ; Kaltchenko, Alexei ; Kieffer, John C.
Author_Institution :
Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada
fDate :
9/1/2001 12:00:00 AM
Abstract :
A grammar transform is a transformation that converts any data sequence to be compressed into a grammar from which the original data sequence can be fully reconstructed. In a grammar-based code, a data sequence is first converted into a grammar by a grammar transform and then losslessly encoded. Among several previously proposed grammar transforms is the multilevel pattern matching (MPM) grammar transform. In this paper, the MPM grammar transform is first extended to the case of side information known to both the encoder and decoder, yielding a conditional MPM (CMPM) grammar transform. A new simple linear-time and space complexity algorithm is then proposed to implement the MPM and CMPM grammar transforms. Based on the CMPM grammar transform, a universal lossless data compression algorithm with side information is developed, which can achieve asymptotically the conditional entropy rate of any stationary, ergodic source pair. It is shown that the algorithm´s worst case redundancy/sample against the k-contest conditional empirical entropy among all individual sequences of length n is upper-bounded by c(1/logn), where c is a constant. The proposed algorithm with side information is the first in the coming family of conditional grammar-based codes, whose expected high efficiency is due to the efficiency of the corresponding unconditional codes
Keywords :
binary sequences; computational complexity; convergence of numerical methods; data compression; entropy codes; grammars; pattern matching; signal reconstruction; transform coding; CMPM grammar transform; MPM grammar transform; asymptotical convergence; conditional MPM grammar transform; conditional empirical entropy; conditional entropy rate; conditional grammar-based codes; data sequence reconstruction; decoder; encoder; grammar-based code; linear-space complexity algorithm; linear-time complexity algorithm; multilevel pattern matching; random binary sequences; sequence length; side information; simulation results; stationary ergodic source pair; unconditional codes; universal lossless data compression; universal lossless data compression algorithm; worst case redundancy/sample; Councils; DNA; Data compression; Decoding; Entropy; Image coding; Image resolution; Information technology; Pattern matching; Sequences;
Journal_Title :
Information Theory, IEEE Transactions on