• DocumentCode
    3065441
  • Title

    Dynamic Data Redistribution for MapReduce Joins

  • Author

    Lynden, Steven ; Tanimura, Yusuke ; Kojima, Isao ; Matono, Akiyoshi

  • Author_Institution
    Inf. Technol. Res. Inst., Nat. Inst. of Adv. Ind. Sci. & Technol. (AIST), Tsukuba, Japan
  • fYear
    2011
  • fDate
    Nov. 29 2011-Dec. 1 2011
  • Firstpage
    717
  • Lastpage
    723
  • Abstract
    MapReduce has become a popular method for data processing, in particular for large scale datasets, due to its accessibility as a scalable yet convenient programming paradigm. Data processing tasks often involve joins, and the repartition and fragment-replicate joins are two widely-used join algorithms utilised within the MapReduce framework. This paper presents a multi-join supporting tuple redistribution, building on both the repartition and fragment-replicate joins. Hadoop is used to demonstrate how reduce tasks may improve performance by passing intermediate results to other reduce tasks that are better able to process them using Apache ZooKeeper as a means of communication and data transfer. A performance analysis is presented showing the technique has the potential to reduce response times when processing multiple joins in single MapReduce jobs.
  • Keywords
    data handling; parallel programming; Apache ZooKeeper; Hadoop; MapReduce joins; data processing; dynamic data redistribution; fragment replicate joins; repartition joins; Algorithm design and analysis; Monitoring; Partitioning algorithms; Query processing; Resource description framework; Servers; Time factors; Database management; MapReduce; Query Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on
  • Conference_Location
    Athens
  • Print_ISBN
    978-1-4673-0090-2
  • Type

    conf

  • DOI
    10.1109/CloudCom.2011.111
  • Filename
    6133220