DocumentCode :
1994807
Title :
Workload characteristics and resource aware Hadoop scheduler
Author :
Divya, M. ; Annappa, B.
Author_Institution :
Nat. Inst. of Technol. Karnataka, Surathkal, India
fYear :
2015
fDate :
9-11 July 2015
Firstpage :
163
Lastpage :
168
Abstract :
Hadoop MapReduce is one of the largely used platforms for large scale data processing. Hadoop cluster has machines with different resources, including memory size, CPU capability and disk space. This introduces challenging research issue of improving Hadoop´s performance through proper resource provisioning. The work presented in this paper focuses on optimizing job scheduling in Hadoop. Workload Characteristic and Resource Aware (WCRA) Hadoop scheduler is proposed, that classifies the jobs into CPU bound and Disk I/O bound. Based on the performance, nodes in the cluster are classified as CPU busy and Disk I/O busy. The amount of primary memory available in the node is ensured to be more than 25% before scheduling the job. Performance parameters of Map tasks such as the time required for parsing the data, map, sort and merge the result, and of Reduce task, such as the time to merge, parse and reduce is considered to categorize the job as CPU bound or Disk I/O bound. Tasks are assigned the priority based on their minimum Estimated Completion Time. The jobs are scheduled on a compute node in such a way that jobs already running on it will not be affected. Experimental results has given 30 % improvement in performance compared to Hadoop´s FIFO, Fair and Capacity scheduler.
Keywords :
data handling; parallel processing; scheduling; CPU bound; CPU capability; Hadoop FIFO scheduler; Hadoop MapReduce; Hadoop cluster; WCRA scheduler; capacity scheduler; disk I/O bound; disk space; fair scheduler; job scheduling optimization; large scale data processing; memory size; minimum estimated completion time; workload characteristic and resource aware Hadoop scheduler; Data processing; Mathematical model; Memory management; Monitoring; Partitioning algorithms; Processor scheduling; Resource management; Hadoop; Job Scheduling; Resource Awareness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Recent Trends in Information Systems (ReTIS), 2015 IEEE 2nd International Conference on
Conference_Location :
Kolkata
Type :
conf
DOI :
10.1109/ReTIS.2015.7232871
Filename :
7232871
Link To Document :
بازگشت