An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud

Author

Koehler, Martin ; Kaniovskyi, Yuriy ; Benkner, Siegfried

Author_Institution

Dept. of Sci. Comput., Univ. of Vienna, Vienna, Austria

fYear

2011

fDate

16-20 May 2011

Firstpage

1122

Lastpage

1131

Abstract

Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.

Keywords

biology computing; cloud computing; data analysis; distributed databases; fault tolerant computing; grid computing; scientific information systems; HPC system; Hadoop framework; MAPE-K loop; MapReduce layer; Vienna grid environment; abstraction layer; application layer; autonomic computing; cloud computing; data-analysis application; data-intensive MapReduce application; data-intensive scientific application; distributed file system layer; grid computing; large-scale distributed data volume; molecular systems biology; optimization framework; programming model; resource layer; self-configuring adaptive framework; service-oriented application development; virtualized compute infrastructure; virtualized storage infrastructure; Adaptation models; Distributed databases; Monitoring; Runtime; Systems biology; XML;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on

Conference_Location

Shanghai

ISSN

1530-2075

Print_ISBN

978-1-61284-425-1

Electronic_ISBN

1530-2075

Type

conf

DOI

10.1109/IPDPS.2011.254

Filename

6008900