DocumentCode
147029
Title
Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets
Author
Wen Xia ; Hong Jiang ; Dan Feng ; Lei Tian
Author_Institution
Wuhan Nat. Lab. for Optoelectron., Wuhan, China
fYear
2014
fDate
26-28 March 2014
Firstpage
203
Lastpage
212
Abstract
Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. In this paper, we present DARE, a Deduplication-Aware Resemblance detection and Elimination scheme for compressing backup datasets that effectively combines data deduplication and delta compression to achieve high data reduction efficiency at low overhead. The main idea behind DARE is to employ a scheme, call Duplicate-Adjacency based Resemblance Detection (DupAdj), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are found to be duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE achieves an additional data reduction by a factor of more than 2 (2X) on top of deduplication with very low overhead while nearly doubling the data restore performance of deduplication-only systems by supplementing delta compression to deduplication.
Keywords
Big Data; data compression; data reduction; DARE; DupAdj; backup datasets; big data era; data chunks; data deduplication; data reduction efficiency; data restore performance; deduplication-aware resemblance detection and elimination scheme; deduplication-only systems; delta compression; digital data; duplicate-adjacency based resemblance detection; low-overhead data reduction; storage systems; super-feature approach; Containers; Educational institutions; Feature extraction; Indexing; Prototypes; Redundancy; Scalability; backup storage system; data reduction; deduplication; delta compression;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Compression Conference (DCC), 2014
Conference_Location
Snowbird, UT
ISSN
1068-0314
Type
conf
DOI
10.1109/DCC.2014.38
Filename
6824428
Link To Document