• DocumentCode
    659415
  • Title

    On the performance and energy efficiency of Hadoop deployment models

  • Author

    Feller, E. ; Ramakrishnan, Lavanya ; Morin, Christine

  • Author_Institution
    Campus Univ. de Beaulieu, Inria Centre Rennes Bretagne-Atlantique, Rennes, France
  • fYear
    2013
  • fDate
    6-9 Oct. 2013
  • Firstpage
    131
  • Lastpage
    136
  • Abstract
    The exponential growth of scientific and business data has resulted in the evolution of the cloud computing and the MapReduce parallel programming model. Cloud computing emphasizes increased utilization and power savings through consolidation while MapReduce enables large scale data analysis. The Hadoop framework has recently evolved to the standard framework implementing the MapReduce model. In this paper, we evaluate Hadoop performance in both the traditional model of collocated data and compute services as well as consider the impact of separating out the services. The separation of data and compute services provides more flexibility in environments where data locality might not have a considerable impact such as virtualized environments and clusters with advanced networks. In this paper, we also conduct an energy efficiency evaluation of Hadoop on physical and virtual clusters in different configurations. Our extensive evaluation shows that: (1) performance on physical clusters is significantly better than on virtual clusters; (2) performance degradation due to separation of the services depends on the data to compute ratio; (3) application completion progress correlates with the power consumption and power consumption is heavily application specific.
  • Keywords
    cloud computing; data handling; energy conservation; parallel programming; power aware computing; Hadoop deployment models; MapReduce parallel programming model; application completion progress; business data; cloud computing; collocated data; compute services; energy efficiency; energy efficiency evaluation; large scale data analysis; physical clusters; power consumption; power savings; scientific data; virtual clusters; virtualized environments; Computational modeling; Data models; Electronic publishing; Encyclopedias; Power demand; Servers; Cloud Computing; Energy Efficiency; Hadoop MapReduce; Performance; Virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data, 2013 IEEE International Conference on
  • Conference_Location
    Silicon Valley, CA
  • Type

    conf

  • DOI
    10.1109/BigData.2013.6691564
  • Filename
    6691564