DocumentCode
170729
Title
Heterogeneity-aware data regeneration in distributed storage systems
Author
Yan Wang ; Dongsheng Wei ; Xunrui Yin ; Xin Wang
Author_Institution
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
fYear
2014
fDate
April 27 2014-May 2 2014
Firstpage
1878
Lastpage
1886
Abstract
Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
Keywords
computer network reliability; data integrity; distributed processing; redundancy; storage area networks; storage management; telecommunication links; telecommunication traffic; trees (mathematics); NP-complete optimal regeneration tree; RCTREE; data integrity; distributed storage systems; flexible tree-structured regeneration scheme; heterogeneity-aware data regeneration; heterogeneous link capacities; heuristic algorithm; large-scale reliable data storage services; low-capacitated link; network traffic; node failures; redundant data regeneration; regeneration process; regeneration time minimization; storage nodes; Bandwidth; Conferences; Distributed databases; Maintenance engineering; Overlay networks; Peer-to-peer computing; Topology;
fLanguage
English
Publisher
ieee
Conference_Titel
INFOCOM, 2014 Proceedings IEEE
Conference_Location
Toronto, ON
Type
conf
DOI
10.1109/INFOCOM.2014.6848127
Filename
6848127
Link To Document