• DocumentCode
    1436859
  • Title

    Discovering Statistical Models of Availability in Large Distributed Systems: An Empirical Study of SETI@home

  • Author

    Javadi, Bahman ; Kondo, Daishi ; Vincent, Jean-Marc ; Anderson, David P.

  • Author_Institution
    Comput. Sci. & Software Eng. Dept., Univ. of Melbourne, Melbourne, VIC, Australia
  • Volume
    22
  • Issue
    11
  • fYear
    2011
  • Firstpage
    1896
  • Lastpage
    1903
  • Abstract
    In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus nonstationary behavior) and fit different models (for example exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modeled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real Internet-distributed system, namely SETI@home. We find that about 21 percent of hosts exhibit availability, that is, a truly random process, and that these hosts can often be modeled accurately with a few distinct distributions from different families. We show that our models are useful and accurate in the context of a scheduling problem that deals with resource brokering. We believe that these methods and models are critical for the design of stochastic scheduling algorithms across large systems where host availability is uncertain.
  • Keywords
    Internet; scheduling; statistical analysis; Internet-distributed system; P2P computing; SETI@home; cloud computing; grid computing; large distributed systems; probability distributions; statistical models; stochastic scheduling algorithms; volunteer distributed computing; Accuracy; Availability; Clustering algorithms; Computational modeling; Context; Measurement; Routing; Statistical availability models; reliability; resource failures; stochastic scheduling.;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2011.50
  • Filename
    5703090