DocumentCode :
656190
Title :
Load-Balanced Recovery Schemes for Single-Disk Failure in Storage Systems with Any Erasure Code
Author :
Xianghong Luo ; Jiwu Shu
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
1-4 Oct. 2013
Firstpage :
552
Lastpage :
561
Abstract :
As increasingly growing volume of data demanding high reliability are stored in disk arrays protected by erasure code, various codes with different error detection and correction capabilities are proposed. For higher reliability, codes that can correct multiple errors (such as RDP, EVENODD, and STAR) become popular. For each of the codes, there can be a number of recovery schemes for re-generating lost data. Among them the one recovering data for a single disk failure is the most critical to systems´ performance and reliability as in most systems the recovery process is initiated as soon as the first failure is detected to reduce the window of vulnerability. Although there are efforts on improving recovery performance for single-disk failure, they either focus only on minimizing the total amount of data accessed for the recovery, which is not necessarily translated into minimal recovery time, or design only for specific codes and lack generality. In this paper, we propose two recovery algorithms that can not only work with any erasure code and produce minimal amount of accessed data, but also minimize the variation of volume of the data accessed on different disks. By minimizing the variation, the disk access can be fully parallelized and the recovery load is balanced, resulting in a faster recovery. We have implemented the recovery schemes in the Jerasure (ver. 1.2) library and evaluated them on a system with 16 SAS disks. Our measurements show that the recovery schemes generated by our algorithms reduce the recovery time for single disk failure situations by as high as 19.9% compared with the state-of-the-art recovery schemes.
Keywords :
fault tolerant computing; resource allocation; storage management; Jerasure library; erasure code; error correction capabilities; error detection capabilities; failure recovery performance; load-balanced recovery schemes; lost data regeneration; recovery algorithms; single-disk failure; storage systems; Algorithm design and analysis; Arrays; Equations; Generators; Mathematical model; Measurement; Optimization; erasure code; load balance; recovery time; single disk failure;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel Processing (ICPP), 2013 42nd International Conference on
Conference_Location :
Lyon
ISSN :
0190-3918
Type :
conf
DOI :
10.1109/ICPP.2013.69
Filename :
6687393
Link To Document :
بازگشت