Deploying and researching Hadoop in virtual machines

Author

Xu, Guanghui ; Xu, Feng ; Ma, Hongxu

Author_Institution

Coll. of Comput. & Inf., Hohai Univ., Nanjing, China

fYear

2012

fDate

15-17 Aug. 2012

Firstpage

395

Lastpage

399

Abstract

Hadoop´s emerging and the maturity of virtualization make it feasible to combine them together to process immense data set. To do research on Hadoop in virtual environment, an experimental environment is needed. This paper firstly introduces some technologies used such as CloudStack, MapReduce and Hadoop. Based on that, a method to deploy CloudStack is given. Then we discuss how to deploy Hadoop in virtual machines which can be obtained from CloudStack by some means, then an algorithm to solve the problem that all the virtual machines which are created by CloudStack using same template have a same hostname. After that we run some Hadoop programs under the virtual cluster, which shows that it is feasible to deploying Hadoop in this way. Then some methods to optimize Hadoop in virtual machines are discussed. From this paper, readers can follow it to set up their own Hadoop experimental environment and capture the current status and trend of optimizing Hadoop in virtual environment.

Keywords

cloud computing; public domain software; virtual machines; virtualisation; CloudStack; Hadoop experimental environment; Hadoop programs; MapReduce; immense data set processing; virtual cluster; virtual environment; virtual machines; virtualization; Cloud computing; Java; Programming; Servers; Virtual machining; CloudStack; Hadoop; MapReduce; Virtualization;

fLanguage

English

Publisher

ieee

Conference_Titel

Automation and Logistics (ICAL), 2012 IEEE International Conference on

Conference_Location

Zhengzhou

ISSN

2161-8151

Print_ISBN

978-1-4673-0362-0

Electronic_ISBN

2161-8151

Type

conf

DOI

10.1109/ICAL.2012.6308241

Filename

6308241