• DocumentCode
    3532764
  • Title

    ADMAD: Application-Driven Metadata Aware De-duplication Archival Storage System

  • Author

    Liu, Chuanyi ; Lu, Yingping ; Shi, Chunhui ; Lu, Guanlin ; Du, David H C ; Wang, Dong-Sheng

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing
  • fYear
    2008
  • fDate
    22-22 Sept. 2008
  • Firstpage
    29
  • Lastpage
    35
  • Abstract
    There is a huge amount of duplicated or redundant data in current storage systems. So data de-duplication, which uses lossless data compression schemes to minimize the duplicated data at the inter-file level, has been receiving broad attention in recent years. But there are still research challenges in current approaches and storage systems, such as: how to chunking the files more efficiently and better leverage potential similarity and identity among dedicated applications; how to store the chunks effectively and reliably into secondary storage devices. In this paper, we propose ADMAD: an application-driven metadata aware de-duplication archival storage system, which makes use of certain meta-data information of different levels in the I/O path to direct the file partitioning into more meaningful data chunks (MC) to maximally reduce the inter-file level duplications. However, the chunks may be with different lengths and variable sizes, storing them into storage devices may result in a lot of fragments and involve a high percentage of random disk accesses, which is very inefficient. Therefore, in ADMAD, chunks are further packaged into fixed sized objects as the storage units to speed up the I/O performance as well as to ease the data management. Preliminary experiments have demonstrated that the proposed system can further reduce the required storage space when compared with current methods (from 20% to near 50% according to several datasets), and largely improves the writing performance (about 50%-70% in average).
  • Keywords
    data compression; information retrieval systems; meta data; storage management; application-driven metadata aware deduplication archival storage system; data chunks; leverage potential similarity; lossless data compression schemes; Application software; Computer architecture; Computer science; Conferences; Data compression; Data engineering; Fingerprint recognition; Network servers; Operating systems; USA Councils; Rabin fingerprinting; archival storage system; cryptographic hash functions; data de-duplication;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Storage Network Architecture and Parallel I/Os, 2008. SNAPI '08. Fifth IEEE International Workshop on
  • Conference_Location
    Baltimore, MD
  • Print_ISBN
    978-0-7695-3408-4
  • Type

    conf

  • DOI
    10.1109/SNAPI.2008.11
  • Filename
    4685844