مرکز منطقه ای اطلاع رساني علوم و فناوري - Lossless Compression Based on the Sequence Memoizer

DocumentCode :

2188989

Title :

Lossless Compression Based on the Sequence Memoizer

Author :

Gasthaus, Jan ; Wood, Frank ; Teh, Yee Whye

Author_Institution :

Gatsby Comput. Neurosci. Unit, UCL, London, UK

fYear :

2010

fDate :

24-26 March 2010

Firstpage :

337

Lastpage :

345

Abstract :

In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.

Keywords :

Bayes methods; data compression; simulation languages; text analysis; Bayesian nonparametric sequence model; PPM variants; Pitman-Yor processes; entropy encoding; incremental approximate inference; lossless compression; power law properties; sequence memoizer; text compression setting; Bayesian methods; Context modeling; Data compression; Decoding; Entropy; Inference algorithms; Neodymium; Predictive models; Samarium; Statistics;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Data Compression Conference (DCC), 2010

Conference_Location :

Snowbird, UT

ISSN :

1068-0314

Print_ISBN :

978-1-4244-6425-8

Electronic_ISBN :

1068-0314

Type :

conf

DOI :

10.1109/DCC.2010.36

Filename :

5453479

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2188989