• DocumentCode
    1269177
  • Title

    Slicing Distributed Systems

  • Author

    Gramoli, Vincent ; Vigfusson, Ymir ; Birman, Ken ; Kermarrec, Anne-Marie ; Van Renesse, Robbert

  • Author_Institution
    EPFL LPD, Univ. of Neuchatel, Lausanne, Switzerland
  • Volume
    58
  • Issue
    11
  • fYear
    2009
  • Firstpage
    1444
  • Lastpage
    1455
  • Abstract
    Peer-to-peer (P2P) architectures are popular for tasks such as collaborative download, VoIP telephony, and backup. To maximize performance in the face of widely variable storage capacities and bandwidths, such systems typically need to shift work from poor nodes to richer ones. Similar requirements are seen in today´s large data centers, where machines may have widely variable configurations, loads, and performance. In this paper, we consider the slicing problem, which involves partitioning the participating nodes into k subsets using a one-dimensional attribute, and updating the partition as the set of nodes and their associated attributes change. The mechanism thus facilitates the development of adaptive systems. We begin by motivating this problem statement and reviewing prior work. Existing algorithms are shown to have problems with convergence, manifesting as inaccurate slice assignments, and to adapt slowly as conditions change. Our protocol, Sliver, has provably rapid convergence, is robust under stress and is simple to implement. We present both theoretical and experimental evaluations of the protocol.
  • Keywords
    distributed algorithms; fault tolerant computing; graph theory; network theory (graphs); peer-to-peer computing; protocols; randomised algorithms; set theory; Sliver protocol; VoIP telephony; adaptive system development; backup task; collaborative download; data center; distributed system slicing algorithm; fault tolerance; graph theory; network bandwidth; node partitioning; one-dimensional attribute subset; peer-to-peer architecture; performance maximization; randomized algorithm; storage capacity; Adaptive systems; Bandwidth; Collaborative work; Convergence; Internet telephony; Partitioning algorithms; Peer to peer computing; Protocols; Robustness; Stress; Distributed systems; fault tolerance; performance evaluation of algorithms and systems.;
  • fLanguage
    English
  • Journal_Title
    Computers, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0018-9340
  • Type

    jour

  • DOI
    10.1109/TC.2009.111
  • Filename
    5184813