Title :
Inline Data Deduplication for SSD-Based Distributed Storage
Author :
Binqi Zhang;Chen Wang;Bing Bing Zhou;Albert Y. Zomaya
Author_Institution :
Sch. of Inf. Technol., Univ. of Sydney, Sydney, NSW, Australia
Abstract :
Data deduplication is used to overcome two issues on Solid State Drives (SSDs). One is price per GB of storage space, and the other is the write limit or disk endurance. By eliminating duplicate data, the deduplication system improves storage efficiency and protects SSD from unnecessary writes. CAFTL is a known solution for deduplication on SSD. We propose a system architecture for inline deduplication based on existing protocol of The Hadoop Distributed File System (HDFS), aiming at addressing performance challenges for primary storage. However, simply applying CAFTL to SSDs in a cluster does not work well. Two routing algorithms are presented and evaluated using selective real-life data sets. Compared to prior work, one routing algorithm (MMHR) may improve the deduplication ratio by 8% at minimal costs while the other (FFFR) can achieve about 30% higher deduplication ratio with tradeoff on chunk level fragmentation. A new research problem of chunk assignment into more than one node for deduplication is also formulated for more studies in this area.
Keywords :
"Routing","Indexes","Distributed databases","Clustering algorithms","Systems architecture","Cloud computing","Metadata"
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2015 IEEE 21st International Conference on
Electronic_ISBN :
1521-9097
DOI :
10.1109/ICPADS.2015.80