DocumentCode :
2989612
Title :
Towards self-caring mapreduce: Proactively reducing fault-induced execution-time penalties
Author :
Kadirvel, Selvi ; Fortes, José A B
Author_Institution :
Adv. Comput. & Inf. Syst. Lab., Univ. of Florida, Gainesville, FL, USA
fYear :
2011
fDate :
4-8 July 2011
Firstpage :
63
Lastpage :
71
Abstract :
Self-Caring IT systems are those that can proactively avoid system failures rather than reactively handle failures after they have occurred. In this paper, we are interested in failures in which a MapReduce job is unable to execute within an SLA-based completion time. The existing fault tolerance capability provided by Map Reduce frameworks is simple and the penalty associated with handling failures could potentially lead to excessive job execution times. Our goal in this paper is to bring out the severity of this penalty for different job characteristics and configurable framework parameters. We first quantitatively evaluate the penalty in execution time associated with node failures in the open-source MapReduce framework, Hadoop using the MRPerf simulator. This increase in execution time is particularly expensive in pay-as-you-go cloud infrastructures where users are charged by resource usage duration. Our solution minimizes job-completion-time SLA violations by augmenting the existing fault-tolerance capability of the MapReduce framework using a dynamic resource scaling approach. This resource scaling approach leverages the elastic properties of a cloud, in order to mitigate execution time penalties and hence proactively avoids a potential job failure. Using our proposed approach for various job and framework parameters, we show that performance penalties can be decreased by up to 78% in the case of singlenode failures and by up to 100% in the case of 4-node failures at minimal additional cost.
Keywords :
cloud computing; fault tolerant computing; public domain software; resource allocation; MRPerf simulator; SLA-based completion time; configurable framework parameter; dynamic resource scaling approach; elastic property; failure handling; fault tolerance capability; fault-induced execution-time penalty reduction; fault-tolerance capability; open-source MapReduce framework; pay-as-you-go cloud infrastructures; resource usage duration; self-caring IT system; self-caring MapReduce; Distributed databases; Dynamic scheduling; Fault tolerance; Fault tolerant systems; Google; Organizations; Runtime; MapReduce; autonomic computing; cloud computing; failure; scaling; self-caring;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
High Performance Computing and Simulation (HPCS), 2011 International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-61284-380-3
Type :
conf
DOI :
10.1109/HPCSim.2011.5999808
Filename :
5999808
Link To Document :
بازگشت