DocumentCode
2976793
Title
Network Load Analysis and Provisioning of MapReduce Applications
Author
Rizvandi, Nikzad Babaii ; Taheri, Javid ; Moraveji, Reza ; Zomaya, Albert Y.
Author_Institution
Center for Distrib. & High Performance Comput., Univ. of Sydney, Sydney, NSW, Australia
fYear
2012
fDate
14-16 Dec. 2012
Firstpage
161
Lastpage
166
Abstract
In this paper, we study the dependency between MapReduce configuration parameters and network load of fixed-size MapReduce jobs during the shuffle phase, then we propose an analytical method to model this dependency. Our approach consists of three key phases: profiling, modeling, and prediction. In the first stage, an application is run several times with different sets of MapReduce configuration parameters (here number of map tasks and number of reduce tasks) to profile the network load of an application in the shuffle phase on a given cluster. Then, the relation between these parameters and the network load is modeled by multivariate linear regression. For evaluation, three applications (Word Count, Exim Main log parsing, and TeraSort) are utilized to evaluate our technique on a 5-node MapReduce private cluster.
Keywords
parallel processing; regression analysis; 5-node MapReduce private cluster; MapReduce applications; MapReduce configuration parameters; multivariate linear regression; network load analysis; Accuracy; Computational modeling; Data models; Distributed computing; Load modeling; Mathematical model; Predictive models; Configuration parameters; MapReduce; multivariate linear regression; network load analysis; number of map tasks; number of reduce tasks; provisioning;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Computing, Applications and Technologies (PDCAT), 2012 13th International Conference on
Conference_Location
Beijing
Print_ISBN
978-0-7695-4879-1
Type
conf
DOI
10.1109/PDCAT.2012.100
Filename
6589257
Link To Document