مرکز منطقه ای اطلاع رساني علوم و فناوري - Job scheduling in Hadoop with Shared Input Policy and RAMDISK

DocumentCode :

166708

Title :

Job scheduling in Hadoop with Shared Input Policy and RAMDISK

Author :

Bezerra, Aprigio ; Hernandez, Porfidio ; Espinosa, Antonio ; Moure, Juan Carlos

Author_Institution :

Escola d´Eng., Univ. Autonoma de Barcelona, Bellaterra, Spain

fYear :

2014

fDate :

22-26 Sept. 2014

Firstpage :

355

Lastpage :

363

Abstract :

Hadoop Framework is a successful option for industry and academia to handle Big Data applications. Large input data sets are split into smaller chunks, distributed among the cluster nodes and processed in the same nodes where they are stored. However, some Hadoop data-intensive applications generate a very large volume of intermediate data to the local file system of each node. Many data spilled to disk associated with concurrent accesses from different tasks that are executed on the same node overload the input/output system. We propose to extend Shared Input Policy, a Hadoop job scheduler policy developed by our research group, by adding a RAMDISK for temporary storage of intermediate data. Shared Input Policy schedules batches of data-intensive jobs that share the same input data set. We add RAMDISK to improve performance of Shared Input Policy. RAMDISK has high throughput and low latency and this allows quick access to intermediate data relieving hard disk. Experimental results show that our approach outperforms Hadoop default policy from 40% to 60% for data intensive applications.

Keywords :

Big Data; job shop scheduling; Big Data applications; Hadoop data-intensive applications; Hadoop default policy; Hadoop framework; Hadoop job scheduler policy; RAMDISK; Shared Input Policy; data intensive applications; data storage; data-intensive jobs; job scheduling; shared input policy; shared input policy schedules batches; Bioinformatics; Buffer storage; Dynamic scheduling; Hard disks; Proposals; Random access memory; Schedules; Bioinformatics; Data Intensive; Hadoop; Intermediate data; RAMDISK;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Cluster Computing (CLUSTER), 2014 IEEE International Conference on

Conference_Location :

Madrid

Type :

conf

DOI :

10.1109/CLUSTER.2014.6968788

Filename :

6968788

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=166708