• DocumentCode
    187008
  • Title

    Improving Cloud Service Resilience Using Brownout-Aware Load-Balancing

  • Author

    Klein, Cristian ; Papadopoulos, Alessandro Vittorio ; Dellkrantz, Manfred ; Durango, Jonas ; Maggio, Martina ; Arzen, Karl-Erik ; Hernandez-Rodriguez, Francisco ; Elmroth, Erik

  • Author_Institution
    Dept. of Comput. Sci., Umea Univ., Umea, Sweden
  • fYear
    2014
  • fDate
    6-9 Oct. 2014
  • Firstpage
    31
  • Lastpage
    40
  • Abstract
    We focus on improving resilience of cloud services (e.g., e-commerce website), when correlated or cascading failures lead to computing capacity shortage. We study how to extend the classical cloud service architecture composed of a load-balancer and replicas with a recently proposed self-adaptive paradigm called brownout. Such services are able to reduce their capacity requirements by degrading user experience (e.g., disabling recommendations). Combining resilience with the brownout paradigm is to date an open practical problem. The issue is to ensure that replica self-adaptivity would not confuse the load-balancing algorithm, overloading replicas that are already struggling with capacity shortage. For example, load-balancing strategies based on response times are not able to decide which replicas should be selected, since the response times are already controlled by the brownout paradigm. In this paper we propose two novel brownout-aware load-balancing algorithms. To test their practical applicability, we extended the popular lighttpd web server and load-balancer, thus obtaining a production-ready implementation. Experimental evaluation shows that the approach enables cloud services to remain responsive despite cascading failures. Moreover, when compared to Shortest Queue First (SQF), believed to be near-optimal in the non-adaptive case, our algorithms improve user experience by 5%, with high statistical significance, while preserving response time predictability.
  • Keywords
    cloud computing; file servers; power aware computing; resource allocation; SQF; Web server; brownout aware load balancing algorithm; cloud service architecture; cloud service resilience; computing capacity shortage; load balancer; replica self-adaptivity; self-adaptive paradigm; shortest queue first; user experience; Algorithm design and analysis; Computer architecture; Generators; Load modeling; Power system faults; Resilience; Time factors; cloud; control theory; load-balancing; self-adaptation; statistical evaluation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Reliable Distributed Systems (SRDS), 2014 IEEE 33rd International Symposium on
  • Conference_Location
    Nara
  • Type

    conf

  • DOI
    10.1109/SRDS.2014.14
  • Filename
    6983377