DocumentCode :
3607809
Title :
Optimizing big data processing performance in the public cloud: opportunities and approaches
Author :
Dan Wang ; Jiangchuan Liu
Author_Institution :
Dept. of Comput., Hong Kong Polytech. Univ., Hong Kong, China
Volume :
29
Issue :
5
fYear :
2015
Firstpage :
31
Lastpage :
35
Abstract :
Today´s lightning fast data generation from massive sources is calling for efficient big data processing, which imposes unprecedented demands on the computing and networking infrastructures. State-of-the-art tools, most notably MapReduce, are generally performed on dedicated server clusters to explore data parallelism. For grass roots users or non-computing professionals, the cost of deploying and maintaining a large-scale dedicated server clusters can be prohibitively high, not to mention the technical skills involved. On the other hand, public clouds allow general users to rent virtual machines and run their applications in a pay-as-you-go manner with ultra-high scalability with minimal upfront costs. This new computing paradigm has gained tremendous success in recent years, becoming a highly attractive alternative to dedicated server clusters. This article discusses the critical challenges and opportunities when big data meet the public cloud. We identify the key differences between running big data processing in a public cloud and in dedicated server clusters. We then present two important problems for efficient big data processing in the public cloud, resource provisioning (i.e., how to rent VMs) and VM-MapReduce job/task scheduling (i.e., how to run MapReduce after the VMs are constructed). Each of these two questions have a set of problems to solve. We present solution approaches for certain problems, and offer optimized design guidelines for others. Finally, we discuss our implementation experiences.
Keywords :
Big Data; cloud computing; parallel processing; virtual machines; Big Data processing; VM-MapReduce job/task scheduling; computing infrastructures; data generation; data parallelism; large-scale dedicated server clusters; networking infrastructures; performance optimization; public cloud; resource provisioning; virtual machines; Big data; Cloud computing; Data processing; Runtime; Servers; Virtualization;
fLanguage :
English
Journal_Title :
Network, IEEE
Publisher :
ieee
ISSN :
0890-8044
Type :
jour
DOI :
10.1109/MNET.2015.7293302
Filename :
7293302
Link To Document :
بازگشت