• DocumentCode
    147029
  • Title

    Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets

  • Author

    Wen Xia ; Hong Jiang ; Dan Feng ; Lei Tian

  • Author_Institution
    Wuhan Nat. Lab. for Optoelectron., Wuhan, China
  • fYear
    2014
  • fDate
    26-28 March 2014
  • Firstpage
    203
  • Lastpage
    212
  • Abstract
    Data reduction has become increasingly important in storage systems due to the explosive growth of digital data in the world that has ushered in the big data era. In this paper, we present DARE, a Deduplication-Aware Resemblance detection and Elimination scheme for compressing backup datasets that effectively combines data deduplication and delta compression to achieve high data reduction efficiency at low overhead. The main idea behind DARE is to employ a scheme, call Duplicate-Adjacency based Resemblance Detection (DupAdj), by considering any two data chunks to be similar (i.e., candidates for delta compression) if their respective adjacent data chunks are found to be duplicate in a deduplication system, and then further enhance the resemblance detection efficiency by an improved super-feature approach. Our experimental results based on real-world and synthetic backup datasets show that DARE achieves an additional data reduction by a factor of more than 2 (2X) on top of deduplication with very low overhead while nearly doubling the data restore performance of deduplication-only systems by supplementing delta compression to deduplication.
  • Keywords
    Big Data; data compression; data reduction; DARE; DupAdj; backup datasets; big data era; data chunks; data deduplication; data reduction efficiency; data restore performance; deduplication-aware resemblance detection and elimination scheme; deduplication-only systems; delta compression; digital data; duplicate-adjacency based resemblance detection; low-overhead data reduction; storage systems; super-feature approach; Containers; Educational institutions; Feature extraction; Indexing; Prototypes; Redundancy; Scalability; backup storage system; data reduction; deduplication; delta compression;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Compression Conference (DCC), 2014
  • Conference_Location
    Snowbird, UT
  • ISSN
    1068-0314
  • Type

    conf

  • DOI
    10.1109/DCC.2014.38
  • Filename
    6824428