• DocumentCode
    170729
  • Title

    Heterogeneity-aware data regeneration in distributed storage systems

  • Author

    Yan Wang ; Dongsheng Wei ; Xunrui Yin ; Xin Wang

  • Author_Institution
    Sch. of Comput. Sci., Fudan Univ., Shanghai, China
  • fYear
    2014
  • fDate
    April 27 2014-May 2 2014
  • Firstpage
    1878
  • Lastpage
    1886
  • Abstract
    Distributed storage systems provide large-scale reliable data storage services by spreading redundancy across a large group of storage nodes. In such big systems, node failures take place on a regular basis. When a node fails or leaves the system, to maintain the same level of redundancy, it is expected to regenerate the redundant data at a replacement node as soon as possible. Previous studies aim to minimize the network traffic in the regeneration process, but in practical networks, where link capacities vary in a wide range, minimizing network traffic does not always mean minimizing regeneration time. Considering the heterogeneous link capacities, Li et al. proposed a tree-structured regeneration scheme, called RCTREE, to bypass the low-capacitated link encountered in direct transmissions. However, we find that RCTREE may rapidly lose data integrity after several regenerations. In this paper, we reconsider the problem of minimizing regeneration time in networks with heterogeneous link capacities. We derive the minimum amount of data to be transmitted through each link to preserve data integrity. We prove that building an optimal regeneration tree is NP-complete and propose a heuristic algorithm for a near-optimal solution. We further introduce a flexible regeneration scheme, which allows providers to generate different amount of coded data. Simulation results show that the flexible tree-structured regeneration scheme can reduce the regeneration time significantly.
  • Keywords
    computer network reliability; data integrity; distributed processing; redundancy; storage area networks; storage management; telecommunication links; telecommunication traffic; trees (mathematics); NP-complete optimal regeneration tree; RCTREE; data integrity; distributed storage systems; flexible tree-structured regeneration scheme; heterogeneity-aware data regeneration; heterogeneous link capacities; heuristic algorithm; large-scale reliable data storage services; low-capacitated link; network traffic; node failures; redundant data regeneration; regeneration process; regeneration time minimization; storage nodes; Bandwidth; Conferences; Distributed databases; Maintenance engineering; Overlay networks; Peer-to-peer computing; Topology;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2014 Proceedings IEEE
  • Conference_Location
    Toronto, ON
  • Type

    conf

  • DOI
    10.1109/INFOCOM.2014.6848127
  • Filename
    6848127