DocumentCode :
3600760
Title :
Exploiting Pipelined Encoding Process to Boost Erasure-Coded Data Archival
Author :
Jianzhong Huang ; Yanqun Wang ; Xiao Qin ; Xianhai Liang ; Shu Yin ; Changsheng Xie
Author_Institution :
Wuhan Nat. Lab. for Optoelectron., Huazhong Univ. of Sci. & Technol., Wuhan, China
Volume :
26
Issue :
11
fYear :
2015
Firstpage :
2984
Lastpage :
2996
Abstract :
This paper addresses an issue of erasure-coded data archival, where (k + r; k) erasure codes are employed to archive rarely accessed replicas. The traditional synchronous encodingprocess neither leverages the existence of replicas, nor handles encoding operations in a decentralized manner. To overcome these drawbacks, we exploit pipelined encoding processes to boost the data archival performance on storage clusters. First, we propose two data layouts called [D + P]cd and [3X]cd by applying a chained-declustering mechanism to both Mirrored RAID-5 and triplication redundancy groups. Second, in light of the [D + P]cd and [3X]cd layouts, we design two archiving schemes named DP and 3X, which exhibit the following three salient features: (i) exploiting data locality-two or three local blocks are read by each involved node for encoding; (ii) decentralized computation load-encoding operations are distributed among k nodes; and (iii) parallel archival processing-two or three encoding pipelines are simultaneously deployed to generate parity blocks. We implement both the DPand 3X schemes and three existing solutions (i.e., SynE, DE, and RapidRAID) in a real-world storage cluster. Experimental results show that our archival schemes outperform the other three solutions in terms of archiving time by a factor of at least 3.41 in a nine-node storage cluster. The experiments strongly indicate that the performance bottleneck of SynE lies in its block-receiving stage; it is disk I/O rather than network traffic that dominates archiving time for both the DE and RapidRAID schemes.
Keywords :
codes; information retrieval systems; pattern clustering; pipeline processing; storage management; 3X archiving scheme; DP archiving scheme; Mirrored RAID-5; SynE; boost erasure-coded data archival; chained-declustering mechanism; data layouts; data locality; decentralized computation load-encoding operations; disk I/O; parallel archival processing; pipelined encoding process; storage clusters; synchronous encoding process; triplication redundancy groups; Bandwidth; Distributed databases; Educational institutions; Encoding; Layout; Pipelines; Redundancy; Erasure-coded storage cluster; data archival; pipelined encoding; power efficiency;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2014.2366113
Filename :
6942231
Link To Document :
بازگشت