DocumentCode
2188989
Title
Lossless Compression Based on the Sequence Memoizer
Author
Gasthaus, Jan ; Wood, Frank ; Teh, Yee Whye
Author_Institution
Gatsby Comput. Neurosci. Unit, UCL, London, UK
fYear
2010
fDate
24-26 March 2010
Firstpage
337
Lastpage
345
Abstract
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.
Keywords
Bayes methods; data compression; simulation languages; text analysis; Bayesian nonparametric sequence model; PPM variants; Pitman-Yor processes; entropy encoding; incremental approximate inference; lossless compression; power law properties; sequence memoizer; text compression setting; Bayesian methods; Context modeling; Data compression; Decoding; Entropy; Inference algorithms; Neodymium; Predictive models; Samarium; Statistics;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2010
Conference_Location
Snowbird, UT
ISSN
1068-0314
Print_ISBN
978-1-4244-6425-8
Electronic_ISBN
1068-0314
Type
conf
DOI
10.1109/DCC.2010.36
Filename
5453479
Link To Document