Title :
Implementing the context tree weighting method for text compression
Author :
Sadakane, Kunihiko ; Okazaki, Takumi ; Imai, Hiroshi
Author_Institution :
Dept. of Inf. Sci., Tokyo Univ., Japan
Abstract :
The context tree weighting method is a universal compression algorithm for FSMX sources. Though we expect that it will have a good compression ratio in practice, it is difficult to implement it and in many cases the implementation is only for estimating compression ratio. Though Willems and Tjalkens (1997) showed a practical implementation using not block probabilities but conditional probabilities, it is used for only binary alphabet sequences. We extend the method for multi-alphabet sequences and show a simple implementation using PPM techniques. We also propose a method to optimize a parameter of the context tree weighting for binary alphabet case. Experimental results on texts and DNA sequences show that the performance of PPM can be improved by combining the context tree weighting and that DNA sequences can be compressed in less than 2.0 bpc
Keywords :
data compression; optimisation; sequences; text analysis; tree data structures; DNA sequences; FSMX sources; PPM techniques; binary alphabet; conditional probabilities; context tree weighting; multi-alphabet sequences; parameter optimization; performance; text compression; universal compression algorithm; Compression algorithms; DNA; Optimization methods; Sequences;
Conference_Titel :
Data Compression Conference, 2000. Proceedings. DCC 2000
Conference_Location :
Snowbird, UT
Print_ISBN :
0-7695-0592-9
DOI :
10.1109/DCC.2000.838152