Title :
An approximation to the greedy algorithm for differential compression of very large files
Author :
Agarwal, Ramesh C. ; Amalapurapu, Suchitra ; Jain, Shaili
Author_Institution :
IBM Almaden Res. Center, San Jose, CA, USA
Abstract :
This paper presents a new differential compression algorithm that combines the hash value and suffix array technique. In this algorithm, hash values for every block of the reference file is computed. Next, suffix arrays on these block hash values are computed. This algorithm finds the longest matches for every offset of the version file. This algorithm depends upon the utilization of three new data structures, the block hash table, the quick index array, and the pointer array, which improves the run-time of the algorithm, and compress very large files.
Keywords :
algorithm theory; block codes; data compression; data structures; file organisation; algorithm run-time; block hash table; data structure; differential compression algorithm; greedy algorithm; hash value; large file compression; pointer array; quick index array; reference file computation; suffix array technique; version file matching; Compression algorithms; Computational biology; Data compression; Data structures; Encoding; Greedy algorithms; Runtime;
Conference_Titel :
Data Compression Conference, 2004. Proceedings. DCC 2004
Print_ISBN :
0-7695-2082-0
DOI :
10.1109/DCC.2004.1281499