Title :
Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets
Author :
Wen Xia ; Hong Jiang ; Dan Feng ; Lei Tian
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Wuhan, China
Abstract :
Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. In this paper, we present DARE, a Deduplication-Aware Resemblance detection and Elimination scheme for compressing backup datasets that effectively combines data deduplication and delta compression to achieve high data reduction efficiency at low overhead. The main idea behind DARE is to employ a scheme, call Duplicate-Adjacency based Resemblance Detection (DupAdj), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are found to be duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE achieves an additional data reduction by a factor of more than 2 (2X) on top of deduplication with very low overhead while nearly doubling the data restore performance of deduplication-only systems by supplementing delta compression to deduplication.
Keywords :
Big Data; data compression; data reduction; DARE; DupAdj; backup datasets; big data era; data chunks; data deduplication; data reduction efficiency; data restore performance; deduplication-aware resemblance detection and elimination scheme; deduplication-only systems; delta compression; digital data; duplicate-adjacency based resemblance detection; low-overhead data reduction; storage systems; super-feature approach; Containers; Educational institutions; Feature extraction; Indexing; Prototypes; Redundancy; Scalability; backup storage system; data reduction; deduplication; delta compression;
Conference_Titel :
Data Compression Conference (DCC), 2014
Conference_Location :
Snowbird, UT
DOI :
10.1109/DCC.2014.38