DocumentCode :
147056
Title :
Better Compression through Better List Update Algorithms
Author :
Kamali, Saman ; Lopez Ortiz, Alejandro
Author_Institution :
Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada
fYear :
2014
fDate :
26-28 March 2014
Firstpage :
372
Lastpage :
381
Abstract :
List update is a key step during the Burrows-Wheeler transform (BWT) compression. Previous work has shown that careful study of the list update step leads to better BWT compression. Surprisingly, the theoretical study of list update algorithms for compression has lagged behind its use in real practice. To be more precise, the standard model by Sleator and Tarjan for list update considers a ´linear cost-of-access´ model while compression incurs a logarithmic cost of access, i.e. accessing item i in the list has cost Theta(i) in the standard model but Theta(log i) in compression applications. These models have been shown, in general, not to be equivalent. This paper has two contributions: (1) We give the first theoretical proof that the commonly used Move-To-Front (MTF) has good performance under the compression logarithmic cost-of-access model. This has long been known in practice but a formal proof under the logarithmic cost compression model was missing until now, (2) we further refine the online compression model to reflect its use under compression by applying the recently developed ´online algorithms with advice´ model. This advice model was initially a purely theoretical construct in which the online algorithm has access to an all powerful oracle during the computation. We show that surprisingly, this seemingly unrealistic model can be used to produce better multi-pass compression algorithms. More precisely, we introduce an ´almost-online´ list update algorithm, which we term BIB which results in a compression scheme which is superior to schemes using standard online algorithms, in particular those of MTF and TIMESTAMP. For example, for the files in the standard Canterbury Corpus, the compression ratio of the scheme that uses BIB is 33.66 on average, while the compression ratios for the schemes that use MTF and TIMESTAMP are respectively 34.25 and 36.30.
Keywords :
data compression; transforms; BWT compression; Burrows Wheeler transform; MTF; Move-To-Front; better compression; better list update algorithms; compression applications; linear cost-of-access model; logarithmic cost; multipass compression algorithms; online algorithms; online compression model; Algorithm design and analysis; Compression algorithms; Computational modeling; Context; Indexes; Optimized production technology; Standards; competitive analysis; data compression; list update; online algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Compression Conference (DCC), 2014
Conference_Location :
Snowbird, UT
ISSN :
1068-0314
Type :
conf
DOI :
10.1109/DCC.2014.86
Filename :
6824445
Link To Document :
بازگشت