• DocumentCode
    147056
  • Title

    Better Compression through Better List Update Algorithms

  • Author

    Kamali, Saman ; Lopez Ortiz, Alejandro

  • Author_Institution
    Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada
  • fYear
    2014
  • fDate
    26-28 March 2014
  • Firstpage
    372
  • Lastpage
    381
  • Abstract
    List update is a key step during the Burrows-Wheeler transform (BWT) compression. Previous work has shown that careful study of the list update step leads to better BWT compression. Surprisingly, the theoretical study of list update algorithms for compression has lagged behind its use in real practice. To be more precise, the standard model by Sleator and Tarjan for list update considers a ´linear cost-of-access´ model while compression incurs a logarithmic cost of access, i.e. accessing item i in the list has cost Theta(i) in the standard model but Theta(log i) in compression applications. These models have been shown, in general, not to be equivalent. This paper has two contributions: (1) We give the first theoretical proof that the commonly used Move-To-Front (MTF) has good performance under the compression logarithmic cost-of-access model. This has long been known in practice but a formal proof under the logarithmic cost compression model was missing until now, (2) we further refine the online compression model to reflect its use under compression by applying the recently developed ´online algorithms with advice´ model. This advice model was initially a purely theoretical construct in which the online algorithm has access to an all powerful oracle during the computation. We show that surprisingly, this seemingly unrealistic model can be used to produce better multi-pass compression algorithms. More precisely, we introduce an ´almost-online´ list update algorithm, which we term BIB which results in a compression scheme which is superior to schemes using standard online algorithms, in particular those of MTF and TIMESTAMP. For example, for the files in the standard Canterbury Corpus, the compression ratio of the scheme that uses BIB is 33.66 on average, while the compression ratios for the schemes that use MTF and TIMESTAMP are respectively 34.25 and 36.30.
  • Keywords
    data compression; transforms; BWT compression; Burrows Wheeler transform; MTF; Move-To-Front; better compression; better list update algorithms; compression applications; linear cost-of-access model; logarithmic cost; multipass compression algorithms; online algorithms; online compression model; Algorithm design and analysis; Compression algorithms; Computational modeling; Context; Indexes; Optimized production technology; Standards; competitive analysis; data compression; list update; online algorithms;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2014
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2014.86
  • Filename
    6824445