DocumentCode
2667563
Title
CREST: Towards Fast Speculation of Straggler Tasks in MapReduce
Author
Lei, Lei ; Wo, Tianyu ; Hu, Chunming
Author_Institution
State Key Lab. of Software Dev. Environ., Beihang Univ., Beijing, China
fYear
2011
fDate
19-21 Oct. 2011
Firstpage
311
Lastpage
316
Abstract
Data-Intensive Computing emerges as the fourth paradigm for modern scientific discoveries. MapReduce, a programming paradigm for large-scale data-parallel applications, is widely applied to web indexing, machine learning, and scientific simulations in industries as well as in academia. Recently, the virtualized "utility computing" environments, such as campus cloud, are becoming an important scenario to run MapReduce jobs. For a MapReduce job, the straggler tasks may dominate the response time and delay whole job. Various speculation schemes have been proposed to alleviate such problem, however, most of them implicitly assume that the time cost for data movement on launching speculative map tasks is trivial, which does not always hold for the virtualized Hadoop clusters in a campus cloud. In this paper, we propose a novel approach, CREST(Combination Re-Execution Scheduling Technology), which can achieve the optimal running time for speculative map tasks and decrease the response time of MapReduce jobs. The main idea is that re-executing a combination of tasks on a group of computing nodes may progress faster than directly speculating the straggler task on target node, due to data locality. The evaluation validates our approach and demonstrates that CREST can reduce the running time of a speculative map task by 70% with best cases and 50% on average, comparing with LATE.
Keywords
Internet; cloud computing; educational computing; indexing; learning (artificial intelligence); parallel processing; scheduling; CREST; MapReduce; Web indexing; campus cloud; combination reexecution scheduling technology; data movement; data-intensive computing; large-scale data-parallel applications; machine learning; response time; scientific discoveries; scientific simulations; speculation schemes; speculative map tasks; straggler tasks; virtualized Hadoop clusters; virtualized utility computing environments; Bandwidth; Computational modeling; Distributed databases; Educational institutions; Programming; Time factors; Virtual machining; MapReduce; campus cloud; combination re-execution; complete graph; data locality; straggler task;
fLanguage
English
Publisher
ieee
Conference_Titel
e-Business Engineering (ICEBE), 2011 IEEE 8th International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4577-1404-7
Type
conf
DOI
10.1109/ICEBE.2011.37
Filename
6104634
Link To Document