• DocumentCode
    3732327
  • Title

    Inline Data Deduplication for SSD-Based Distributed Storage

  • Author

    Binqi Zhang;Chen Wang;Bing Bing Zhou;Albert Y. Zomaya

  • Author_Institution
    Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
  • fYear
    2015
  • Firstpage
    593
  • Lastpage
    600
  • Abstract
    Data deduplication is used to overcome two issues on Solid State Drives (SSDs). One is price per GB of storage space, and the other is the write limit or disk endurance. By eliminating duplicate data, the deduplication system improves storage efficiency and protects SSD from unnecessary writes. CAFTL is a known solution for deduplication on SSD. We propose a system architecture for inline deduplication based on existing protocol of The Hadoop Distributed File System (HDFS), aiming at addressing performance challenges for primary storage. However, simply applying CAFTL to SSDs in a cluster does not work well. Two routing algorithms are presented and evaluated using selective real-life data sets. Compared to prior work, one routing algorithm (MMHR) may improve the deduplication ratio by 8% at minimal costs while the other (FFFR) can achieve about 30% higher deduplication ratio with tradeoff on chunk level fragmentation. A new research problem of chunk assignment into more than one node for deduplication is also formulated for more studies in this area.
  • Keywords
    "Routing","Indexes","Distributed databases","Clustering algorithms","Systems architecture","Cloud computing","Metadata"
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on
  • Electronic_ISBN
    1521-9097
  • Type

    conf

  • DOI
    10.1109/ICPADS.2015.80
  • Filename
    7384343