DocumentCode :
244104
Title :
Managing Tiny Tasks for Data-Parallel, Subsampling Workloads
Author :
Kambhampati, S. ; Kelley, Jaimie ; Stewart, Craig ; Stewart, William C. L. ; Ramnath, Rajiv
fYear :
2014
fDate :
11-14 March 2014
Firstpage :
225
Lastpage :
234
Abstract :
Subsampling workloads compute statistics from a set of observed samples using a random subset of sample data (i.e., a subsample). Data-parallel platforms group these samples into tasks, each task subsamples its data in parallel. In this paper, we study subsampling workloads that benefit from tiny tasks-i.e., tasks comprising few samples. Tiny tasks reduce processor cache misses caused by random subsampling, which speeds up per-task running time. However, they can also cause significant scheduling overheads that negate the time reduction from reduced cache misses. For example, vanilla Hadoop takes longer to start tiny tasks than to run them. We compared the task scheduling overheads of vanilla Hadoop, lightweight Hadoop setups, and BashReduce. BashReduce, the best platform, outperformed the worst by 3.6X but scheduling overhead was still 12% of a task´s running time. We improved BashReduce´s scheduler by allowing it to size tasks according to kneepoints on the miss rate curve. We tested these changes on high-throughput genotype data and on data obtained from Netflix. Our improved BashReduce outperformed vanilla Hadoop by almost 3X and completed short, interactive jobs almost as efficiently as long jobs. These results held at scale and across diverse, heterogeneous hardware.
Keywords :
cache storage; parallel processing; scheduling; statistics; BashReduce scheduler; Netflix; data-parallel platform; high-throughput genotype data; lightweight Hadoop setups; miss rate curve; processor cache misses reduction; statistics; subsampling workloads; task scheduling overheads; tiny task management; vanilla Hadoop; Benchmark testing; Bioinformatics; Delays; Genomics; Monitoring; Runtime; Software;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Engineering (IC2E), 2014 IEEE International Conference on
Conference_Location :
Boston, MA
Type :
conf
DOI :
10.1109/IC2E.2014.94
Filename :
6903477
Link To Document :
بازگشت