DocumentCode :
1922679
Title :
Design and performance evaluation for Hadoop clusters on virtualized environment
Author :
Ishii, M. ; Jungkyu Han ; Makino, Hiroaki
Author_Institution :
Software Innovation Center, Nippon Telegraph & Telephone Corp., Tokyo, Japan
fYear :
2013
fDate :
28-30 Jan. 2013
Firstpage :
244
Lastpage :
249
Abstract :
Hadoop an implementation of Google´s MapReduce, is widely used in these days for big data analysis. Yahoo Inc. operated 25 PB with 25,000 nodes in 2010. The resource management for such large number of nodes is quite difficult from the aspects of configuration, deployment and efficient resource utilization. By deploying virtual machines (VMs), Hadoop management becomes much easier. Amazon already released the Hadoop on Xen-virtualized environment as Elastic MapReduce. However, Hadoop on VM clusters degrades its performance due to the overhead of the virtualization. Thus, it is important to minimize the overhead. We build a Hadoop performance model and examine how the performance is affected by changing VM configuration, allocation of VMs over physical machines, and multiplicity of jobs. We find that performance of the I/O-intensive jobs is more sensitive to the virtualization overhead than that of CPU-intensive jobs. The performance degradation caused by the VM configuration change is 55% at most and the one caused by allocation change is 18% at most for I/O-intensive jobs. For I/O intensive jobs, the best practice is to increase the number of VMs and not to increase the number of VCPUs in a VM, to allocate VMs widely over physical servers, and to decrease the number of simultaneous executed jobs. The main factor of virtualization overhead is disk I/O shared by multiple VMs in a physical server.
Keywords :
Internet; computer network management; computer network performance evaluation; resource allocation; virtual machines; Elastic MapReduce; Google MapReduce; Hadoop cluster; Hadoop management; Xen-virtualized environment; data analysis; performance evaluation; resource management; virtual machine; Degradation; Equations; Resource management; Servers; Switches; Throughput; Virtualization; Hadoop performance evaluation; KVM; cluster; virtual machine; virtualized environment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Networking (ICOIN), 2013 International Conference on
Conference_Location :
Bangkok
ISSN :
1976-7684
Print_ISBN :
978-1-4673-5740-1
Electronic_ISBN :
1976-7684
Type :
conf
DOI :
10.1109/ICOIN.2013.6496384
Filename :
6496384
Link To Document :
بازگشت