مرکز منطقه ای اطلاع رساني علوم و فناوري - Configuring a MapReduce Framework for Performance-Heterogeneous Clusters

DocumentCode :

249323

Title :

Configuring a MapReduce Framework for Performance-Heterogeneous Clusters

Author :

Hartog, J. ; Delvalle, Renan ; Govindaraju, M. ; Lewis, Michael J.

Author_Institution :

Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA

fYear :

2014

fDate :

June 27 2014-July 2 2014

Firstpage :

120

Lastpage :

127

Abstract :

When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically divide data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker´s labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them.

Keywords :

Big Data; computer centres; parallel programming; pattern clustering; performance evaluation; Big Data applications; Hadoop; MARLA; MapReduce framework; MapReduce programming model; data centers; dynamic data partitioning; finer grained subtasking; nonuniform processing capability; performance-heterogeneity; performance-heterogeneous clusters; performance-heterogeneous environments; worker labor; Bars; Big data; Data models; Delays; Program processors; Random access memory; Runtime; Big Data; Heterogeneous; MapReduce;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (BigData Congress), 2014 IEEE International Congress on

Conference_Location :

Anchorage, AK

Print_ISBN :

978-1-4799-5056-0

Type :

conf

DOI :

10.1109/BigData.Congress.2014.26

Filename :

6906769

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=249323