• DocumentCode
    725395
  • Title

    Selection of Virtual Machines Based on Classification of MapReduce Jobs

  • Author

    Blaisse, Adam Pasqua ; Wagner, Zachary Andrew ; Jie Wu

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Temple Univ., Temple, TX, USA
  • fYear
    2015
  • fDate
    June 29 2015-July 2 2015
  • Firstpage
    82
  • Lastpage
    86
  • Abstract
    The MapReduce Computing paradigm has become a very popular and useful tool since its introduction. Many large companies including Facebook, IBM, Yahoo, Twitter, and Google have found intuitive ways to incorporate MapReduce into their current needs and operations. A driving force of the growth in the popularity of MapReduce is the need for a system to handle and process large data. MapReduce is a distributed system, which can handle large quantities of data by adding more servers to a cluster. With large data sets only getting larger, there has been a need to increase the size of the currently running MapReduce clusters. This growth in the current clusters can lead to some problems. Often, newly added servers are not the same type of server used by a cluster. This is a problem because MapReduce and its open source implementation called Hadoop both assume that the servers in the cluster are all the same. Due to these issues, many researchers in the past have tried to focus on making the scheduling within MapReduce better for heterogeneous clusters. More recently, the idea of cloud computing has become popular. The idea is to run virtual machines within a cluster of servers. Since these machines are virtual, we can spin up as many identical machines as the project calls for. While this seems like a good fix to the heterogeneous MapReduce cluster problem, it leads itself to other issues that we will address. This paper will address a major issue in selecting virtual machines that maximize the speed of a MapReduce job.
  • Keywords
    data handling; parallel programming; virtual machines; Hadoop; MapReduce computing paradigm; distributed system; heterogeneous MapReduce cluster problem; identical machines; open source implementation; virtual machines; Companies; Conferences; Facebook; Measurement; Random access memory; Servers; Virtual machining; Cloud Computing; Eucalyptus; Hadoop; MapReduce; Virtual Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems Workshops (ICDCSW), 2015 IEEE 35th International Conference on
  • Conference_Location
    Columbus, OH
  • Type

    conf

  • DOI
    10.1109/ICDCSW.2015.25
  • Filename
    7165088