DocumentCode :
249323
Title :
Configuring a MapReduce Framework for Performance-Heterogeneous Clusters
Author :
Hartog, J. ; Delvalle, Renan ; Govindaraju, M. ; Lewis, Michael J.
Author_Institution :
Dept. of Comput. Sci., State Univ. of New York at Binghamton, Binghamton, NY, USA
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
120
Lastpage :
127
Abstract :
When data centers employ the common and economical practice of upgrading subsets of nodes incrementally, rather than replacing or upgrading all nodes at once, they end up with clusters whose nodes have non-uniform processing capability, which we also call performance-heterogeneity. Popular frameworks supporting the effective MapReduce programming model for Big Data applications do not flexibly adapt to these environments. Instead, existing MapReduce frameworks, including Hadoop, typically divide data evenly among worker nodes, thereby inducing the well-known problem of stragglers on slower nodes. Our alternative MapReduce framework, called MARLA, divides each worker´s labor into sub-tasks, delays the binding of data to worker processes, and thereby enables applications to run faster in performance-heterogeneous environments. This approach does introduce overhead, however. We explore and characterize the opportunity for performance gains, and identify when the benefits outweigh the costs. Our results suggest that frameworks should support finer grained sub-tasking and dynamic data partitioning when running on some performance-heterogeneous clusters. Blindly taking this approach in homogeneous clusters can slow applications down. Our study further suggests the opportunity for cluster managers to build performance-heterogeneous clusters by design, if they also run MapReduce frameworks that can exploit them.
Keywords :
Big Data; computer centres; parallel programming; pattern clustering; performance evaluation; Big Data applications; Hadoop; MARLA; MapReduce framework; MapReduce programming model; data centers; dynamic data partitioning; finer grained subtasking; nonuniform processing capability; performance-heterogeneity; performance-heterogeneous clusters; performance-heterogeneous environments; worker labor; Bars; Big data; Data models; Delays; Program processors; Random access memory; Runtime; Big Data; Heterogeneous; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2014 IEEE International Congress on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5056-0
Type :
conf
DOI :
10.1109/BigData.Congress.2014.26
Filename :
6906769
Link To Document :
بازگشت