• DocumentCode
    3678331
  • Title

    Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization

  • Author

    Lorenz Fischer;Shen Gao;Abraham Bernstein

  • Author_Institution
    Dept. of Inf., Univ. of Zurich, Zurich, Switzerland
  • fYear
    2015
  • Firstpage
    22
  • Lastpage
    31
  • Abstract
    Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise. In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach.
  • Keywords
    "Topology","Bayes methods","Optimization","Storms","Parallel processing","Fasteners","Message systems"
  • Publisher
    ieee
  • Conference_Titel
    Cluster Computing (CLUSTER), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/CLUSTER.2015.13
  • Filename
    7307560