Title :
PRISM: Fine-Grained Resource-Aware Scheduling for MapReduce
Author :
Qi Zhang ; Zhani, Mohamed Faten ; Yuke Yang ; Boutaba, Raouf ; Wong, Bernard
Author_Institution :
David. R. Cheriton Sch. of Comput. Sci., Univ. of Waterloo, Waterloo, ON, Canada
fDate :
April-June 1 2015
Abstract :
MapReduce has become a popular model for data-intensive computation in recent years. By breaking down each job into small map and reduce tasks and executing them in parallel across a large number of machines, MapReduce can significantly reduce the running time of data-intensive jobs. However, despite recent efforts toward designing resource-efficient MapReduce schedulers, existing solutions that focus on scheduling at the task-level still offer sub-optimal job performance. This is because tasks can have highly varying resource requirements during their lifetime, which makes it difficult for task-level schedulers to effectively utilize available resources to reduce job execution time. To address this limitation, we introduce PRISM, a fine-grained resource-aware MapReduce scheduler that divides tasks into phases, where each phase has a constant resource usage profile, and performs scheduling at the phase level. We first demonstrate the importance of phase-level scheduling by showing the resource usage variability within the lifetime of a task using a wide-range of MapReduce jobs. We then present a phase-level scheduling algorithm that improves execution parallelism and resource utilization without introducing stragglers. In a 10-node Hadoop cluster running standard benchmarks, PRISM offers high resource utilization and provides 1.3× improvement in job running time compared to the current Hadoop schedulers.
Keywords :
data handling; parallel processing; pattern clustering; resource allocation; scheduling; 10-node Hadoop cluster; MapReduce jobs; PRISM; constant resource usage profile; data-intensive computation; data-intensive jobs; execution parallelism; fine-grained resource-aware MapReduce scheduler; fine-grained resource-aware scheduling; job execution time; phase-level scheduling; phase-level scheduling algorithm; resource requirements; resource utilization; resource-efficient MapReduce schedulers; suboptimal job performance; Cloud computing; Resource management; Schedules; Scheduling; Scheduling algorithms; Cloud Computing; Cloud computing; Hadoop; MapReduce; resource allocation; scheduling; scheduling,;
Journal_Title :
Cloud Computing, IEEE Transactions on
DOI :
10.1109/TCC.2014.2379096