• DocumentCode
    1405348
  • Title

    Efficient Deduplication Techniques for Modern Backup Operation

  • Author

    Min, Jaehong ; Yoon, Daeyoung ; Won, Youjip

  • Author_Institution
    Dept. of Comput. Sci., Hanyang Univ., Seoul, South Korea
  • Volume
    60
  • Issue
    6
  • fYear
    2011
  • fDate
    6/1/2011 12:00:00 AM
  • Firstpage
    824
  • Lastpage
    840
  • Abstract
    In this work, we focus on optimizing the deduplication system by adjusting the pertinent factors in fingerprint lookup and chunking, the factors which we identify as the key ingredients of efficient deduplication. For efficient fingerprint lookup, we propose fingerprint management scheme called LRU-based Index Partitioning. For efficient chunking, we propose Incremental Modulo-K(INC-K) algorithm which is optimized Rabin´s algorithm where we significantly reduce the number of arithmetic operations exploiting the algebraic nature of modulo arithmetic. LRU-based Index Partitioning uses the notion of tablet and enforces access locality of the fingerprint lookup in storing fingerprints. We maintain tablets with LRU manner to exploit temporal locality of the fingerprint lookup. To preserve access correlation across the tablets, we apply prefetching in maintaining tablet list. We propose Context-aware chunking to maximize chunking speed and deduplication ratio. We develop prototype backup system and performed comprehensive analysis on various factors and their relationship: average chunk size, chunking speed, deduplication ratio, tablet management algorithms, and overall backup speed. By increasing the average chunk size from 4 KB to 10 KB, chunking time increases by 34.3 percent, deduplication ratio decreases by 0.66 percent and the overall backup speed increases by 50 percent (from 51.4 MB/sec to 77.8 MB/sec).
  • Keywords
    file organisation; LRU-based index partitioning; Rabin algorithm; context-aware chunking; deduplication technique; fingerprint chunking; fingerprint lookup; fingerprint management scheme; incremental modulo-K algorithm; tablet notion; Fingerprint recognition; Generators; History; Indexes; Partitioning algorithms; Redundancy; Servers; Deduplication; backup; chunking; fingerprint lookup.; index partitioning;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2010.263
  • Filename
    5669285