Title :
Heterogeneity-aware data regeneration in distributed storage systems
Author :
Yan Wang ; Dongsheng Wei ; Xunrui Yin ; Xin Wang
Author_Institution :
Sch. of Comput. Sci., Fudan Univ., Shanghai, China
fDate :
April 27 2014-May 2 2014
Abstract :
Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
Keywords :
computer network reliability; data integrity; distributed processing; redundancy; storage area networks; storage management; telecommunication links; telecommunication traffic; trees (mathematics); NP-complete optimal regeneration tree; RCTREE; data integrity; distributed storage systems; flexible tree-structured regeneration scheme; heterogeneity-aware data regeneration; heterogeneous link capacities; heuristic algorithm; large-scale reliable data storage services; low-capacitated link; network traffic; node failures; redundant data regeneration; regeneration process; regeneration time minimization; storage nodes; Bandwidth; Conferences; Distributed databases; Maintenance engineering; Overlay networks; Peer-to-peer computing; Topology;
Conference_Titel :
INFOCOM, 2014 Proceedings IEEE
Conference_Location :
Toronto, ON
DOI :
10.1109/INFOCOM.2014.6848127