Title :
LsPS: A Job Size-Based Scheduler for Efficient Task Assignments in Hadoop
Author :
Yi Yao ; Jianzhe Tai ; Bo Sheng ; Ningfang Mi
Author_Institution :
Dept. of Electr. & Comput. Eng., Northeastern Univ., Boston, MA, USA
Abstract :
The MapReduce paradigm and its open source implementation Hadoop are emerging as an important standard for large-scale data-intensive processing in both industry and academia. A MapReduce cluster is typically shared among multiple users with different types of workloads. When a flock of jobs are concurrently submitted to a MapReduce cluster, they compete for the shared resources and the overall system performance in terms of job response times, might be seriously degraded. Therefore, one challenging issue is the ability of efficient scheduling in such a shared MapReduce environment. However, we find that conventional scheduling algorithms supported by Hadoop cannot always guarantee good average response times under different workloads. To address this issue, we propose a new Hadoop scheduler, which leverages the knowledge of workload patterns to reduce average job response times by dynamically tuning the resource shares among users and the scheduling algorithms for each user. Both simulation and real experimental results from Amazon EC2 cluster show that our scheduler reduces the average MapReduce job response time under a variety of system workloads compared to the existing FIFO and Fair schedulers.
Keywords :
parallel processing; processor scheduling; public domain software; resource allocation; Amazon EC2 cluster; FIFO; LsPS:; MapReduce job response time; fair scheduler; job size-based scheduler; large-scale data-intensive processing; open source implementation Hadoop; shared MapReduce environment; Cloud computing; Monitoring; Scheduling algorithms; Time factors; Time measurement; Hadoop; Heavy-tailed workloads; MapReduce; bursty workloads; hadoop; heavy-tailed workloads; schdeuling;
Journal_Title :
Cloud Computing, IEEE Transactions on
DOI :
10.1109/TCC.2014.2338291