• DocumentCode
    3435017
  • Title

    Improving the Shuffle of Hadoop MapReduce

  • Author

    Jingui Li ; Xuelian Lin ; Xiaolong Cui ; Yue Ye

  • Author_Institution
    Sch. of Comput. Sci. & Eng., Beihang Univ., Beijing, China
  • Volume
    1
  • fYear
    2013
  • fDate
    2-5 Dec. 2013
  • Firstpage
    266
  • Lastpage
    273
  • Abstract
    As an efficient parallel computing system based on MapReduce model, Hadoop is widely used for large-scale data analysis such as data mining, machine learning and scientific simulation. However, there are still some performance problems in MapReduce, especially the situation in the shuffle phase. In order to solve these problems, in this paper, a lightweight individual shuffle service component with more efficient I/O policy was proposed rather than the existing shuffle phase in MapReduce. We also describe how to implement the shuffle service in three steps: extract shuffle from reduce task as a shuffle task, reconstruct the shuffle task as a service and improve I/O scheduling policy on Map sides. Furthermore both simulated experiments and MapReduce job comparative studies are conducted to evaluate the performance of our improvements. The result reveals that our approach can decrease the whole job´s execution time and make full use of cluster resources.
  • Keywords
    data analysis; data mining; input-output programs; learning (artificial intelligence); parallel programming; public domain software; software performance evaluation; Hadoop MapReduce shuffle improvement; I-O scheduling policy improvement; Map sides; data mining; large-scale data analysis; machine learning; parallel computing system; performance evaluation; reduce task; scientific simulation; shuffle extraction; shuffle service component; shuffle task-as-a-service; Bandwidth; Computational modeling; Data models; Facebook; Google; Memory management; Protocols; hadoop; mapreduce; shuffle;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference on
  • Conference_Location
    Bristol
  • Type

    conf

  • DOI
    10.1109/CloudCom.2013.42
  • Filename
    6753807