• DocumentCode
    2445412
  • Title

    LEEN: Locality/Fairness-Aware Key Partitioning for MapReduce in the Cloud

  • Author

    Ibrahim, Shadi ; Jin, Hai ; Lu, Lu ; Wu, Song ; He, Bingsheng ; Qi, Li

  • Author_Institution
    Cluster & Grid Comput. Lab., Huazhong Univ. of Sci. & Technol., Wuhan, China
  • fYear
    2010
  • fDate
    Nov. 30 2010-Dec. 3 2010
  • Firstpage
    17
  • Lastpage
    24
  • Abstract
    This paper investigates the problem of Partitioning Skew in MapReduce-based system. Our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presence of partitioning skew causes a huge amount of data transfer during the shuffle phase and leads to significant unfairness on the reduce input among different data nodes. As a result, the applications experience performance degradation due to the long data transfer during the shuffle phase along with the computation skew, particularly in reduce phase. We develop a novel algorithm named LEEN for locality-aware and fairness-aware key partitioning in MapReduce. LEEN embraces an asynchronous map and reduce scheme. All buffered intermediate keys are partitioned according to their frequencies and the fairness of the expected data distribution after the shuffle phase. We have integrated LEEN into Hadoop-0.18.0. Our experiments demonstrate that LEEN can efficiently achieve higher locality and reduce the amount of shuffled data. More importantly, LEEN guarantees fair distribution of the reduce inputs. As a result, LEEN achieves a performance improvement of up to 40% on different workloads.
  • Keywords
    cloud computing; electronic data interchange; mobile computing; Hadoop; LEEN; MapReduce; asynchronous map; cloud computing; data nodes; data transfer; fairness-aware key partitioning; locality-aware key partitioning; partitioning skew; shuffle phase; Cloud computing; Degradation; Distributed databases; Kernel; Partitioning algorithms; Performance evaluation; Time factors; Cloud Computing; Hadoop; MapReduce; partationing skew;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on
  • Conference_Location
    Indianapolis, IN
  • Print_ISBN
    978-1-4244-9405-7
  • Electronic_ISBN
    978-0-7695-4302-4
  • Type

    conf

  • DOI
    10.1109/CloudCom.2010.25
  • Filename
    5708429