• DocumentCode
    2188989
  • Title

    Lossless Compression Based on the Sequence Memoizer

  • Author

    Gasthaus, Jan ; Wood, Frank ; Teh, Yee Whye

  • Author_Institution
    Gatsby Comput. Neurosci. Unit, UCL, London, UK
  • fYear
    2010
  • fDate
    24-26 March 2010
  • Firstpage
    337
  • Lastpage
    345
  • Abstract
    In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.
  • Keywords
    Bayes methods; data compression; simulation languages; text analysis; Bayesian nonparametric sequence model; PPM variants; Pitman-Yor processes; entropy encoding; incremental approximate inference; lossless compression; power law properties; sequence memoizer; text compression setting; Bayesian methods; Context modeling; Data compression; Decoding; Entropy; Inference algorithms; Neodymium; Predictive models; Samarium; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2010
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Print_ISBN
    978-1-4244-6425-8
  • Electronic_ISBN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2010.36
  • Filename
    5453479