Title :
Improving Hadoop Service Provisioning in a Geographically Distributed Cloud
Author :
Qi Zhang ; Ling Liu ; Kisung Lee ; Yang Zhou ; Singh, Ashutosh ; Mandagere, Nagapramod ; Gopisetty, Sandeep ; Alatorre, Gabriel
fDate :
June 27 2014-July 2 2014
Abstract :
With more data generated and collected in a geographically distributed manner, combined by the increased computational requirements for large scale data-intensive analysis, we have witnessed the growing demand for geographically distributed Cloud datacenters and hybrid Cloud service provisioning, enabling organizations to support instantaneous demand of additional computational resources and to expand inhouse resources to maintain peak service demands by utilizing cloud resources. A key challenge for running applications in such a geographically distributed computing environment is how to efficiently schedule and perform analysis over data that is geographically distributed across multiple datacenters. In this paper, we first compare multi-datacenter Hadoop deployment with single-datacenter Hadoop deployment to identify the performance issues inherent in a geographically distributed cloud. A generalization of the problem characterization in the context of geographically distributed cloud datacenters is also provided with discussions on general optimization strategies. Then we describe the design and implementation of a suite of system-level optimizations for improving performance of Hadoop service provisioning in a geo-distributed cloud, including prediction-based job localization, configurable HDFS data placement, and data prefetching. Our experimental evaluation shows that our prediction based localization has very low error ratio, smaller than 5%, and our optimization can improve the execution time of Reduce phase by 48.6%.
Keywords :
cloud computing; parallel programming; resource allocation; Hadoop service provisioning; cloud resource utilization; configurable HDFS data placement; data intensive analysis; data prefetching; geographically distributed cloud data center; hybrid cloud service provisioning; prediction based localization; prediction-based job localization; Cloud computing; Distributed databases; Optimization; Predictive models; Schedules; Virtualization; Cross-cloud Hadoop deployment; Geographically distributed cloud; Hybrid cloud; Performance optimizaiton;
Conference_Titel :
Cloud Computing (CLOUD), 2014 IEEE 7th International Conference on
Conference_Location :
Anchorage, AK
Print_ISBN :
978-1-4799-5062-1
DOI :
10.1109/CLOUD.2014.65