• DocumentCode
    3143808
  • Title

    An Adaptive Framework for the Execution of Data-Intensive MapReduce Applications in the Cloud

  • Author

    Koehler, Martin ; Kaniovskyi, Yuriy ; Benkner, Siegfried

  • Author_Institution
    Dept. of Sci. Comput., Univ. of Vienna, Vienna, Austria
  • fYear
    2011
  • fDate
    16-20 May 2011
  • Firstpage
    1122
  • Lastpage
    1131
  • Abstract
    Cloud computing technologies play an increasingly important role in realizing data-intensive applications by offering a virtualized compute and storage infrastructure that can scale on demand. A programming model that has gained a lot of interest in this context is MapReduce, which simplifies processing of large-scale distributed data volumes, usually on top of a distributed file system layer. In this paper we report on a self-configuring adaptive framework for developing and optimizing data-intensive scientific applications on top of Cloud and Grid computing technologies and the Hadoop framework. Our framework relies on a MAPE-K loop, known from autonomic computing, for optimizing the configuration of data-intensive applications at three abstraction layers: the application layer, the MapReduce layer, and the resource layer. By evaluating monitored resources, the framework configures the layers and allocates the resources on a per job basis. The evaluation of configurations relies on historic data and a utility function that ranks different configurations regarding to the arising costs. The optimization framework has been integrated in the Vienna Grid Environment (VGE), a service-oriented application development environment for providing applications on HPC systems, clusters and Clouds as services. An experimental evaluation of our framework has been undertaken with a data-analysis application from the field of molecular systems biology.
  • Keywords
    biology computing; cloud computing; data analysis; distributed databases; fault tolerant computing; grid computing; scientific information systems; HPC system; Hadoop framework; MAPE-K loop; MapReduce layer; Vienna grid environment; abstraction layer; application layer; autonomic computing; cloud computing; data-analysis application; data-intensive MapReduce application; data-intensive scientific application; distributed file system layer; grid computing; large-scale distributed data volume; molecular systems biology; optimization framework; programming model; resource layer; self-configuring adaptive framework; service-oriented application development; virtualized compute infrastructure; virtualized storage infrastructure; Adaptation models; Distributed databases; Monitoring; Runtime; Systems biology; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW), 2011 IEEE International Symposium on
  • Conference_Location
    Shanghai
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-61284-425-1
  • Electronic_ISBN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2011.254
  • Filename
    6008900