• DocumentCode
    3462270
  • Title

    Dynamic Profiling and Feedback Framework for Reduce-Side Join

  • Author

    Nakayama, Makoto ; Yamazaki, Kinya ; Tanaka, Shoji ; Kasahara, Hironori

  • Author_Institution
    Res. Labs., NTT DOCOMO, Inc., Japan
  • fYear
    2013
  • fDate
    3-5 Dec. 2013
  • Firstpage
    1255
  • Lastpage
    1262
  • Abstract
    MapReduce has become popular and Reduce-side join is one of the most important application of MapReduce. Data skew, in which the data load assigned to each Reduce task fluctuates task by task, increases the MapReduce job completion time. This paper proposes a dynamic profiling and feedback framework that works on a MapReduce cluster. The framework allows programmers to build their own algorithm to address data skew on Reduce-side join based on their specific knowledge and/or requirements. This paper also proposes an estimation method which makes our framework adapt to a wide range of MapReduce cluster sizes. This paper presents two example algorithms to address data skew using the estimation method, and the experimental results shows up to 2.59 times speed-up of join completion time on a cluster with 50 servers and highly skewed input data.
  • Keywords
    parallel programming; program diagnostics; MapReduce cluster sizes; MapReduce job completion time; data load assignment; data skew; dynamic profiling; feedback framework; reduce task; reduce-side join; skewed input data; Clustering algorithms; Estimation; Feedback control; Measurement; Monitoring; Partitioning algorithms; Servers; Data skew; Feedback; Framework; Profiling; Reduce-side Join;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Science and Engineering (CSE), 2013 IEEE 16th International Conference on
  • Conference_Location
    Sydney, NSW
  • Type

    conf

  • DOI
    10.1109/CSE.2013.187
  • Filename
    6755369