DocumentCode :
172931
Title :
Scalability Analysis and Improvement of Hadoop Virtual Cluster with Cost Consideration
Author :
Yanzhang He ; Xiaohong Jiang ; Zhaohui Wu ; Kejiang Ye ; Zhongzhong Chen
Author_Institution :
Coll. of Comput. Sci., Zhejiang Univ. Hangzhou, Hangzhou, China
fYear :
2014
fDate :
June 27 2014-July 2 2014
Firstpage :
594
Lastpage :
601
Abstract :
With the rapid development of big data and cloud computing, big data analytics as a service in the cloud is becoming increasingly popular. More and more individuals and organizations tend to rent virtual cluster to store and analyze data rather than building their own data centers. However, in virtualization environment, whether scaling out using a cluster with more nodes to process big data is better than scaling up by adding more resources to the original virtual machines (VMs) in cluster is not clear. In this paper, we study the scalability performance issues of hadoop virtual cluster with cost consideration. We first present the design and implementation of VirtualMR platform which can provide users with scalable hadoop virtual cluster services for the MapReduce based big data analytics. Then we run a series of hadoop benchmarks and real parallel machine learning algorithms to evaluate the scalability performance, including scale-up method and scale-out method. Finally, we integrate our platform with resource monitoring module and propose a system tuner. By analyzing the monitored data, we dynamically adjust the parameters of hadoop framework and virtual machine configuration to improve resource utilization and reduce rent cost. Experimental results show that the scale-up method outperforms the scale-out method for CPU-bound applications, and it is opposite for I/O-bound applications. The results also verify the efficiency of system tuner to increase resource utilization and reduce rent cost.
Keywords :
Big Data; cloud computing; learning (artificial intelligence); virtual machines; CPU-bound application; Hadoop virtual cluster; I/O-bound application; MapReduce; VirtualMR platform; big data analytics; cloud computing; cost consideration; parallel machine learning algorithm; scalability analysis; scale-out method; scale-up method; virtual machine; Benchmark testing; Big data; Monitoring; Parallel processing; Resource management; Scalability; Virtualization; MapReduce; Scalability; big data; cloud computing; rent cost;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
Type :
conf
DOI :
10.1109/CLOUD.2014.85
Filename :
6973791
Link To Document :
بازگشت