مرکز منطقه ای اطلاع رساني علوم و فناوري - Using Inter-file Similarity to Improve Intra-file Compression

DocumentCode :

249341

Title :

Using Inter-file Similarity to Improve Intra-file Compression

Author :

Molfetas, Angelos ; Wirth, Andreas ; Zobel, Justin

Author_Institution :

Dept. of Comput. & Inf. Syst., Univ. of Melbourne, Melbourne, VIC, Australia

fYear :

2014

fDate :

June 27 2014-July 2 2014

Firstpage :

192

Lastpage :

199

Abstract :

In storage systems with vast numbers of files, compression techniques should exploit of inter-file similarity, while allowing for near-atomic access to individual files. In differential compression, collections of files are compressed by identifying shared common strings. Therefore, some files are represented largely by references to strings in other files. In addition, a file in the collection can be (further) compressed by identifying common strings within the file itself. At the cost of decompression latency, but a possible gain in compression effectiveness, an LZ-style within-file compressor could resolve these references to other files. To quantify the compression gain, we experiment with a variety of file collections, from emails to source code, and test against multiple measures. If the LZ scheme honors the inter-file references, then there is only minimal improvement. If the LZ algorithm replaces inter-file references with intra-file references, then up to 3% compression improvement is witnessed for mildly similar files, and over 200% improvement for highly similar files.

Keywords :

data compression; source code (software); storage management; LZ algorithm; LZ-style within-file compressor; compression effectiveness; decompression latency; differential compression; e-mails; file collection compression; interfile similarity; intrafile compression; intrafile references; near-atomic access; shared common strings; source code; storage systems; Compression algorithms; Dictionaries; Electronic mail; Encoding; Encyclopedias; Indexes; Measurement; Differential compression; LZ factorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (BigData Congress), 2014 IEEE International Congress on

Conference_Location :

Anchorage, AK

Print_ISBN :

978-1-4799-5056-0

Type :

conf

DOI :

10.1109/BigData.Congress.2014.35

Filename :

6906778

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=249341