DocumentCode :
3588728
Title :
Model to estimate the size of a Hadoop cluster - HCEm
Author :
de Souza Brito, Jose Benedito ; Araujo, Aleteia Patricia F.
Author_Institution :
Dept. of Comput. Sci., Univ. de Braslia (UnB), Braslia, Brazil
fYear :
2014
Firstpage :
859
Lastpage :
866
Abstract :
This paper describes a model which aims to estimate the size of a cluster running Hadoop framework for the processing of large datasets at a given timeframe. As main contributions it denes (i) a light layer of optimization for MapReduce jobs, (ii) presents a model to estimate the size cluster for a Hadoop framework and (iii) performs tests using a real environment - the Amazon Elastic MapReduce. The proposed approach works with the MapReduce to dene the main configuration parameters and determines computational resources of hosts in the cluster in order to meet the desired runtime for the requirements of a given workload requirement. Thus, the results show that the proposed model is able to avoid to over-allocation or sub-allocation of computing resources on a Hadoop cluster.
Keywords :
Big Data; parallel processing; Amazon Elastic MapReduce; Big Data; HCEm; Hadoop cluster size estimation; MapReduce jobs; Complexity theory; Computational modeling; Data models; Memory management; Optimization; Random access memory; Virtualization; Big Data; Clusters; Hadoop; MapReduce; Performance Model;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
Type :
conf
DOI :
10.1109/PADSW.2014.7097897
Filename :
7097897
Link To Document :
بازگشت