DocumentCode :
249315
Title :
Storing a Collection of Differentially Compressed Files Recursively
Author :
Molfetas, Angelos ; Wirth, Andreas ; Zobel, Justin
Author_Institution :
Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
88
Lastpage :
95
Abstract :
A collection of files can be compressed by storing each file in the collection as a delta file: one file refers to several other files. The copy instructions in a delta file could reference other files either in their encoded forms or in their (original) unencoded forms. Because files are stored compressed, the latter approach suffers from a blowout in the number of files that need to be decoded to retrieve a single file. The process is recursive: decoding a file requires referenced files be decoded, which in turn may require the files they reference to be decoded, and so on. Hence the second method is slower than the first, as it generates large chains of dependencies. However, the second method could of course identify larger common strings between files, and so compress the collection better. These two schemes are compared to determine whether the compression gain is sufficient to justify the blowout in latency. Furthermore, we implement a threshold on the level of recursion, and show that this unfortunately results in rather poor compression.
Keywords :
data compression; file organisation; compression gain; delta file; differentially compressed file; file collection storage; file decoding; latency blowout; Decoding; Electronic publishing; Encoding; Encyclopedias; Internet; Kernel; Differential compression;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.22
Filename :
6906765
Link To Document :
بازگشت