Title :
Lossless Compression Based on the Sequence Memoizer
Author :
Gasthaus, Jan ; Wood, Frank ; Teh, Yee Whye
Author_Institution :
Gatsby Comput. Neurosci. Unit, UCL, London, UK
Abstract :
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.
Keywords :
Bayes methods; data compression; simulation languages; text analysis; Bayesian nonparametric sequence model; PPM variants; Pitman-Yor processes; entropy encoding; incremental approximate inference; lossless compression; power law properties; sequence memoizer; text compression setting; Bayesian methods; Context modeling; Data compression; Decoding; Entropy; Inference algorithms; Neodymium; Predictive models; Samarium; Statistics;
Conference_Titel :
Data Compression Conference (DCC), 2010
Conference_Location :
Snowbird, UT
Print_ISBN :
978-1-4244-6425-8
Electronic_ISBN :
1068-0314
DOI :
10.1109/DCC.2010.36