Title :
Scalable Analytics for IaaS Cloud Availability
Author :
Ghosh, Rajesh ; Longo, Federica ; Frattini, Flavio ; Russo, S. ; Trivedi, Kishor S.
Author_Institution :
IBM, Durham, NC, USA
Abstract :
In a large Infrastructure-as-a-Service (IaaS) cloud, component failures are quite common. Such failures may lead to occasional system downtime and eventual violation of Service Level Agreements (SLAs) on the cloud service availability. The availability analysis of the underlying infrastructure is useful to the service provider to design a system capable of providing a defined SLA, as well as to evaluate the capabilities of an existing one. This paper presents a scalable, stochastic model-driven approach to quantify the availability of a large-scale IaaS cloud, where failures are typically dealt with through migration of physical machines among three pools: hot (running), warm (turned on, but not ready), and cold (turned off). Since monolithic models do not scale for large systems, we use an interacting Markov chain based approach to demonstrate the reduction in the complexity of analysis and the solution time. The three pools are modeled by interacting sub-models. Dependencies among them are resolved using fixed-point iteration, for which existence of a solution is proved. The analytic-numeric solutions obtained from the proposed approach and from the monolithic model are compared. We show that the errors introduced by interacting sub-models are insignificant and that our approach can handle very large size IaaS clouds. The simulative solution is also considered for the proposed model, and solution time of the methods are compared.
Keywords :
Markov processes; cloud computing; contracts; iterative methods; system monitoring; IaaS cloud availability; Markov chain based approach; SLA; analytic-numeric solutions; cloud service availability; component failures; fixed-point iteration; infrastructure-as-a-service cloud; large-scale IaaS cloud; monolithic models; physical machines; scalable analytics; service level agreements; service provider; stochastic model-driven approach; system downtime; Analytical models; Cloud computing; Computational modeling; Failure analysis; Maintenance engineering; Markov processes; Analytic-numeric solution; availability; cloud computing; downtime; simulation; stochastic reward nets;
Journal_Title :
Cloud Computing, IEEE Transactions on
DOI :
10.1109/TCC.2014.2310737