Title :
Keep it moving: Proactive workload management for reducing SLA violations in large scale SaaS clouds
Author :
Roy, Anirban ; Ganesan, Rajeshwari ; Sarkar, Santonu
Author_Institution :
Next Gen Comput. Lab., Electron. City, Bangalore, India
Abstract :
Software failures, workload-related failures and job overload conditions bring about SLA violations in software-as-a-service (SaaS) systems. Existing work does not address mitigation of SLA violations completely as (i) none of them address mitigation of SLA violations in business specific scenarios (SaaS, in our case), (ii) while some do not address software and workload-related failures, other approaches do not address the problem of target PM selection for workload migration comprehensively (leaving out vital considerations like workload compatibility checks between migrating VM and VMs at the target PM) and (iii) a clear mathematical mapping between workload, resource demand and SLA is lacking. In this paper, we present the Keep It Moving (KIM) software framework for the cloud controller that helps minimize service failures due to SLA violation of availability, utilization and response time in SaaS cloud data centers. Though we consider migration to be the primary mitigation technique, we also try to mitigate SLA violations without migration. We achieve this by performing a capacity check on the host physical machine (PM) before the migration to identify if enough capacity is available on the current PM to address the upcoming SLA violations by restart/reboot or VM resizing. In certain cases such as workload-related failures due to corrupt files, we prefer workload rerouting to a replica VM over migration. We formulate the selection of a target PM as a multi-objective optimization problem. We validate our proposed approach by using a trace-based discrete event simulation of a virtualized data center where failure and workload characteristics are simulated from data extracted from a real SaaS business server logs. We found that a 60% reduction in SLA violation is possible using our approach as well as reducing VM downtime by approximately 10%.
Keywords :
cloud computing; optimisation; software reliability; Keep It Moving software framework; SLA violations reduction; cloud controller; job overload conditions; large scale SaaS clouds; mathematical mapping; multiobjective optimization problem; primary mitigation technique; proactive workload management; software failures; software-as-a-service systems; trace-based discrete event simulation; virtualized data center; workload-related failures; Availability; Business; Databases; Servers; Software as a service; Time factors; SLA violation; application logs; business SaaS data center; failures; multi-objective optimization;
Conference_Titel :
Software Reliability Engineering (ISSRE), 2013 IEEE 24th International Symposium on
Conference_Location :
Pasadena, CA
DOI :
10.1109/ISSRE.2013.6698895