• DocumentCode
    262260
  • Title

    Dynamic Workload Balancing for Hadoop MapReduce

  • Author

    Xiaofei Hou ; Ashwin Kumar, T.K. ; Thomas, Johnson P. ; Varadharajan, Vijay

  • Author_Institution
    Comput. Sci. Dept., Oklahoma State Univ., Stillwater, OK, USA
  • fYear
    2014
  • fDate
    3-5 Dec. 2014
  • Firstpage
    56
  • Lastpage
    62
  • Abstract
    Hadoop has two components which are HDFS and MapReduce. HDFS is a distributed file system for storing data for users of Hadoop and MapReduce is the framework that executes jobs from users. Hadoop stores user data based on space utilization of data nodes on the cluster rather than the processing capability of the data nodes. Furthermore Hadoop runs in a heterogeneous environment as all data nodes may not be homogeneous. For these reasons, workload imbalances will occur when Hadoop runs resulting in poor performance. In this paper, we propose a dynamic algorithm to balance the workload between different racks on a Hadoop cluster based on information obtained from analyzing the log files of Hadoop. Moving tasks from the busiest rack to another rack improves the performance of Hadoop MapReduce by reducing the running time of jobs. Our simulations indicate that using our algorithm, we can decrease by more than 50% the remaining time of the tasks belonged to a job running on the busiest rack.
  • Keywords
    data handling; parallel processing; Hadoop MapReduce; Hadoop cluster; dynamic algorithm; dynamic workload balancing; Bandwidth; Big data; Clustering algorithms; Heuristic algorithms; Load management; Switches; Dynamic Workload balancing; Hadoop; MapReduce; OpenFlow;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data and Cloud Computing (BdCloud), 2014 IEEE Fourth International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/BDCloud.2014.103
  • Filename
    7034766