• DocumentCode
    610341
  • Title

    HANDS: A heuristically arranged non-backup in-line deduplication system

  • Author

    Wildani, A. ; Miller, Eric L. ; Rodeh, O.

  • Author_Institution
    Storage Syst. Res. Center, UC Santa Cruz, Santa Cruz, CA, USA
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    446
  • Lastpage
    457
  • Abstract
    Deduplicating in-line data on primary storage is hampered by the disk bottleneck problem, an issue which results from the need to keep an index mapping portions of data to hash values in memory in order to detect duplicate data without paying the performance penalty of disk paging. The index size is proportional to the volume of unique data, so placing the entire index into RAM is not cost effective with a deduplication ratio below 45%. HANDS reduces the amount of in-memory index storage required by up to 99% while still achieving between 30% and 90% of the deduplication a full memory-resident index provides, making primary deduplication cost effective in workloads with deduplication rates as low as 8%. HANDS is a framework that dynamically pre-fetches fingerprints from disk into memory cache according to working sets statistically derived from access patterns. We use a simple neighborhood grouping as our statistical technique to demonstrate the effectiveness of our approach. HANDS is modular and requires only spatio-temporal data, making it suitable for a wide range of storage systems without the need to modify host file systems.
  • Keywords
    cache storage; paged storage; random-access storage; statistical analysis; storage management; HANDS; RAM; deduplication cost; deduplication ratio; disk bottleneck problem; disk paging; duplicate data detection; fingerprint prefetching; full memory-resident index; in-memory index storage; index mapping portion; memory cache; neighborhood grouping; nonbackup in-line deduplication system; performance penalty; primary storage; spatio-temporal data; statistical technique; storage system; Indexes; Measurement; Memory management; Organizations; Prediction algorithms; Random access memory; Scalability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544846
  • Filename
    6544846