Title :
Provisioning and Evaluating Multi-domain Networked Clouds for Hadoop-based Applications
Author :
Mandal, Anirban ; Xin, Yufeng ; Baldine, Ilia ; Ruth, Paul ; Heerman, Chris ; Chase, Jeff ; Orlikowski, Victor ; Yumerefendi, Aydan
Author_Institution :
Renaissance Comput. Inst., Univ. of North Carolina at Chapel Hill, Chapel Hill, NC, USA
fDate :
Nov. 29 2011-Dec. 1 2011
Abstract :
This paper presents the design, implementation, and evaluation of a new system for on-demand provisioning of Hadoop clusters across multiple cloud domains. The Hadoop clusters are created "on-demand" and are composed of virtual machines from multiple cloud sites linked with bandwidth-provisioned network pipes. The prototype uses an existing federated cloud control framework called Open Resource Control Architecture (ORCA), which orchestrates the leasing and configuration of virtual infrastructure from multiple autonomous cloud sites and network providers. ORCA enables computational and network resources from multiple clouds and network substrates to be aggregated into a single virtual "slice" of resources, built to order for the needs of the application. The experiments examine various provisioning alternatives by evaluating the performance of representative Hadoop benchmarks and applications on resource topologies with varying bandwidths. The evaluations examine conditions in which multi-cloud Hadoop deployments pose significant advantages or disadvantages during Map/Reduce/Shuffle operations. Further, the experiments compare multi-cloud Hadoop deployments with single-cloud deployments and investigate Hadoop Distributed File System (HDFS) performance under varying network configurations. The results show that networked clouds make cross-cloud Hadoop deployment feasible with high bandwidth network links between clouds. As expected, performance for some benchmarks degrades rapidly with constrained inter-cloud bandwidth. MapReduce shuffle patterns and certain Hadoop Distributed File System (HDFS) operations that span the constrained links are particularly sensitive to network performance. Hadoop\´s topology-awareness feature can mitigate these penalties to a modest degree in these hybrid bandwidth scenarios. Additional observations show that contention among co-located virtual machines is a source of irregular performance for Hadoop applications on virtual cloud infr- structure.
Keywords :
cloud computing; file organisation; virtual machines; HDFS; Hadoop cluster; Hadoop distributed file system; Hadoop-based application; MapReduce shuffle pattern; ORCA; bandwidth-provisioned network pipe; federated cloud control framework; multidomain networked cloud; open resource control architecture; topology-awareness feature; virtual cloud infrastructure; virtual machine; Bandwidth; Benchmark testing; Educational institutions; Network topology; Prototypes; Topology; Virtual machining; Multi-cloud provisioning; Network provisioning; Networked clouds; Performance analysis of Hadoop/MapReduce;
Conference_Titel :
Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on
Conference_Location :
Athens
Print_ISBN :
978-1-4673-0090-2
DOI :
10.1109/CloudCom.2011.107