• DocumentCode
    1665508
  • Title

    A Study of Data Locality in YARN

  • Author

    Elshater, Yehia ; Martin, Patrick ; Rope, Dan ; McRoberts, Mike ; Statchuk, Craig

  • Author_Institution
    Sch. of Comput., Queen´s Univ., Kingston, ON, Canada
  • fYear
    2015
  • Firstpage
    174
  • Lastpage
    181
  • Abstract
    Co-locating the computation as close as possible to the data is an important consideration in the current data intensive systems. This is known as data locality problem. In this paper, we analyze the impact of data locality on YARN, which is the new version of Hadoop. We investigate YARN delay scheduler behavior with respect to data locality for a variety of workloads and configurations. We address in this paper three problems related to data locality. First, we study the trade-off between the data locality and the job completion time. Secondly, we observe that there is an imbalance of resource allocation when considering the data locality, which may under-utilize the cluster. Thirdly, we address the redundant I/O operations when different YARN containers request input data blocks on the same node. Additionally, we propose YARN Locality Simulator (YLocSim), a simulator tool that simulates the interactions between YARN components in a real cluster and reports the data locality percentages in real time. We validate YLocSim over a real cluster setup and use it in our study.
  • Keywords
    data handling; digital simulation; input-output programs; parallel processing; resource allocation; scheduling; Hadoop; I/O operation; YARN delay scheduler behavior; YARN locality simulator tool; YLocSim; data intensive system; data locality; resource allocation; Bandwidth; Benchmark testing; Containers; Delays; Resource management; Scheduling; Yarn; Data Locality; Hadoop; Scheduling; Simulation; YARN;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.33
  • Filename
    7207217