Title :
An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud
Author :
Koehler, Martin ; Kaniovskyi, Yuriy ; Benkner, Siegfried
Author_Institution :
Dept. of Sci. Comput., Univ. of Vienna, Vienna, Austria
Abstract :
Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.
Keywords :
biology computing; cloud computing; data analysis; distributed databases; fault tolerant computing; grid computing; scientific information systems; HPC system; Hadoop framework; MAPE-K loop; MapReduce layer; Vienna grid environment; abstraction layer; application layer; autonomic computing; cloud computing; data-analysis application; data-intensive MapReduce application; data-intensive scientific application; distributed file system layer; grid computing; large-scale distributed data volume; molecular systems biology; optimization framework; programming model; resource layer; self-configuring adaptive framework; service-oriented application development; virtualized compute infrastructure; virtualized storage infrastructure; Adaptation models; Distributed databases; Monitoring; Runtime; Systems biology; XML;
Conference_Titel :
Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-425-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2011.254