Title :
Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure
Author :
Gunarathne, Thilina ; Zhang, Bingjing ; Wu, Tak-Lon ; Qiu, Judy
Author_Institution :
Sch. of Inf. & Comput., Indiana Univ., Bloomington, IN, USA
Abstract :
Recent advancements in data intensive computing for science discovery are fueling a dramatic growth in use of data-intensive iterative computations. The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very attractive environment for scientists to perform such data intensive computations. The challenges to large scale distributed computations on clouds demand new computation frameworks that are specifically tailored for cloud characteristics in order to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. It extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a wide array of large-scale iterative data analysis for scientific applications on Azure cloud. This paper presents the applicability of Twister4Azure with highlighted features of fault-tolerance, efficiency and simplicity. We study three data-intensive applications - two iterative scientific applications, Multi-Dimensional Scaling and KMeans Clustering, one data - intensive pleasingly parallel scientific application, BLAST+ sequence searching. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks.
Keywords :
cloud computing; data analysis; iterative methods; parallel programming; pattern clustering; scientific information systems; software fault tolerance; BLAST+ sequence searching; HPC; K-means clustering; MapReduce programming model; Twister4Azure application; Windows Azure Cloud; cloud computing; cloud infrastructure services; data intensive computing; data-intensive iterative computation; distributed decentralized iterative MapReduce runtime; fault tolerance; large-scale distributed computation; large-scale iterative data analysis; multidimensional scaling; portable parallel programming; science discovery; utility computing model; Cloud computing; Computational modeling; Computer architecture; Data models; Distributed databases; Programming; Runtime; Cloud Computing; HPC; Iterative MapReduce; Scientific applications;
Conference_Titel :
Utility and Cloud Computing (UCC), 2011 Fourth IEEE International Conference on
Conference_Location :
Victoria, NSW
Print_ISBN :
978-1-4577-2116-8
DOI :
10.1109/UCC.2011.23