DocumentCode :
2188989
Title :
Lossless Compression Based on the Sequence Memoizer
Author :
Gasthaus, Jan ; Wood, Frank ; Teh, Yee Whye
Author_Institution :
Gatsby Comput. Neurosci. Unit, UCL, London, UK
fYear :
2010
fDate :
24-26 March 2010
Firstpage :
337
Lastpage :
345
Abstract :
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.
Keywords :
Bayes methods; data compression; simulation languages; text analysis; Bayesian nonparametric sequence model; PPM variants; Pitman-Yor processes; entropy encoding; incremental approximate inference; lossless compression; power law properties; sequence memoizer; text compression setting; Bayesian methods; Context modeling; Data compression; Decoding; Entropy; Inference algorithms; Neodymium; Predictive models; Samarium; Statistics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2010
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Print_ISBN :
978-1-4244-6425-8
Electronic_ISBN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2010.36
Filename :
5453479
Link To Document :
بازگشت