• DocumentCode
    35572
  • Title

    Enabling Concurrent Failure Recovery for Regenerating-Coding-Based Storage Systems: From Theory to Practice

  • Author

    Runhui Li ; Jian Lin ; Lee, Patrick P. C.

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Chinese Univ. of Hong Kong, Shatin, China
  • Volume
    64
  • Issue
    7
  • fYear
    2015
  • fDate
    July 1 2015
  • Firstpage
    1898
  • Lastpage
    1911
  • Abstract
    Data availability is critical in distributed storage systems, especially when node failures are prevalent in real life. A key requirement is to minimize the amount of data transferred among nodes when recovering the lost or unavailable data of failed nodes. This paper explores recovery solutions based on regenerating codes, which have been designed to provide fault-tolerant storage and minimum bandwidth. Existing optimal regenerating codes are designed for single node failures. We build a system called CORE, which augments existing optimal regenerating codes for the recovery of a general number of failures including single and concurrent failures. We show theoretically that CORE achieves the minimum possible bandwidth for most cases. We implement a CORE prototype and evaluate it atop an HDFS cluster testbed with up to 20 storage nodes. We demonstrate that our CORE prototype conforms to our theoretical findings and achieves bandwidth savings when compared to the conventional recovery approach based on erasure codes.
  • Keywords
    data handling; distributed processing; program compilers; system recovery; data availability; distributed storage systems; enabling concurrent failure recovery; erasure codes; failed nodes; fault tolerant storage; optimal regenerating codes; regenerating codes; regenerating coding based storage systems; single node failures; Availability; Bandwidth; Encoding; Equations; Nickel; Peer-to-peer computing; Strips; Regenerating codes; coding theory; distributed storage systems; experiments and implementation; failure recovery;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2014.2349518
  • Filename
    6880379