Title :
AE: An Asymmetric Extremum content defined chunking algorithm for fast and bandwidth-efficient data deduplication
Author :
Yucheng Zhang ; Hong Jiang ; Dan Feng ; Wen Xia ; Min Fu ; Fangting Huang ; Yukun Zhou
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol., Wuhan, China
fDate :
April 26 2015-May 1 2015
Abstract :
Data deduplication, a space-efficient and bandwidth-saving technology, plays an important role in bandwidth-efficient data transmission in various data-intensive network and cloud applications. Rabin-based and MAXP-based Content-Defined Chunking (CDC) algorithms, while robust in finding suitable cut-points for chunk-level redundancy elimination, face the key challenges of (1) low chunking throughput that renders the chunking stage the deduplication performance bottleneck and (2) large chunk-size variance that decreases deduplication efficiency. To address these challenges, this paper proposes a new CDC algorithm called the Asymmetric Extremum (AE) algorithm. The main idea behind AE is based on the observation that the extreme value in an asymmetric local range is not likely to be replaced by a new extreme value in dealing with the boundaries-shift problem, which motivates AE´s use of asymmetric (rather than symmetric as in MAXP) local range to identify cut-points and simultaneously achieve high chunking throughput and low chunk-size variance. As a result, AE simultaneously addresses the problems of low chunking throughput in MAXP and Rabin and high chunk-size variance in Rabin. The experimental results based on four real-world datasets show that AE improves the throughput performance of the state-of-the-art CDC algorithms by 3x while attaining comparable or higher deduplication efficiency.
Keywords :
computer networks; data handling; AE algorithms; CDC algorithm; asymmetric extremum algorithm; asymmetric extremum content defined chunking algorithm; bandwidth efficient data transmission; bandwidth saving technology; bandwidth-efficient data deduplication; cloud applications; content defined chunking algorithms; fast data deduplication; Algorithm design and analysis; Arrays; Computers; Conferences; Power capacitors; Redundancy; Throughput;
Conference_Titel :
Computer Communications (INFOCOM), 2015 IEEE Conference on
Conference_Location :
Kowloon
DOI :
10.1109/INFOCOM.2015.7218510