Cost aware fault recovery in clouds

Author

Israel, Assaf ; Raz, Danny

Author_Institution

Technion - Israel Inst. of Technol., Haifa, Israel

fYear

2013

fDate

27-31 May 2013

Firstpage

9

Lastpage

17

Abstract

Maintaining high availability of IaaS services at a reasonable cost is a challenging task that received recent attention due to the growing popularity of Cloud computing as a preferred means of affordable IT outsourcing. In large data-centers faults are prone to happen and thus the only reasonable cost-effective method of providing high availability of services is an SLA aware recovery plan; that is, a mapping of the service VMs onto backup machines where they can be executed in case of a failure. The recovery process may benefit from powering on some of these machines in advance, since redeployment on powered machines is much faster. However, this comes with an additional maintenance cost, so the real problem is how to balance between the expected recovery time improvement and the cost of machines activation. We model this problem as an offline optimization problem and present a bicriteria approximation algorithm for it. While this is the first performance guaranteed algorithm for this problem, it is somewhat complex to implement in practice. Thus, we further present a much simpler and practical heuristic based on a greedy approach. We evaluate the performance of this heuristic over real data-center data, and show that it performs well in terms of scale, hierarchical faults and variant costs. Our results indicate that our scheme can reduce the overall recovery costs by 10-15% when compared to currently used approaches. We also show that fault recovery cost aware VM placement may farther help reducing the expected recovery costs, as it can reduce the backup machine activations costs.

Keywords

approximation theory; cloud computing; cost reduction; greedy algorithms; optimisation; performance evaluation; virtual machines; IT outsourcing; IaaS services; SLA aware recovery plan; backup machine activation cost reduction; backup machines; bicriteria approximation algorithm; cloud computing; cost aware fault recovery; cost-effective method; data center faults; fault recovery cost aware VM placement; hierarchical faults; maintenance cost; offline optimization problem; performance evaluation; performance guaranteed algorithm; powered machines; real data center data; recovery cost reduction; recovery time improvement; Approximation algorithms; Approximation methods; Availability; Hardware; Linear systems; Maintenance engineering; Optimization;

fLanguage

English

Publisher

ieee

Conference_Titel

Integrated Network Management (IM 2013), 2013 IFIP/IEEE International Symposium on

Conference_Location

Ghent

Print_ISBN

978-1-4673-5229-1

Type

conf

Filename

6572964