• DocumentCode
    3199354
  • Title

    Grouping Blocks for MapReduce Co-Locality

  • Author

    Xiao Yu ; Bo Hong

  • Author_Institution
    Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2015
  • fDate
    25-29 May 2015
  • Firstpage
    271
  • Lastpage
    280
  • Abstract
    Avoiding off-switch communication is critical in enhancing the performance of MapReduce/Hadoop cluster. Current efforts in Hadoop only focus on minimizing off-switch for map tasks, and yet reduce tasks shuffle data across the whole cluster because file blocks (hence map tasks) are scattered. In this paper, we argue that grouping blocks in a few racks can greatly decrease the amount of off-switch data exchange and therefore shorten the execution time of jobs in the cluster. We proposed mechanisms to place data in a grouped fashion and to schedule tasks accordingly. We explored the trade-off between the improvement on off-switch communication and loss of parallelism, we discussed methods to mitigate the loss of parallelism issue. Extensive experiments show that our method can significantly avoid off-switch communication, and in result decrease job execution time by up to 56%.
  • Keywords
    electronic data interchange; parallel processing; pattern clustering; scheduling; Hadoop cluster; MapReduce cluster; MapReduce co-locality; job execution time; map task; off-switch communication; off-switch data exchange; task scheduling; task shuffle data; Bandwidth; History; Parallel processing; Runtime; Schedules; Scheduling; Switches; Group Data Blocks; Map/Reduce Co-locality; MapReduce/Hadoop;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International
  • Conference_Location
    Hyderabad
  • ISSN
    1530-2075
  • Type

    conf

  • DOI
    10.1109/IPDPS.2015.16
  • Filename
    7161516