• DocumentCode
    2018005
  • Title

    Performance analysis of Coupling Scheduler for MapReduce/Hadoop

  • Author

    Tan, Jian ; Meng, Xiaoqiao ; Zhang, Li

  • Author_Institution
    IBM T. J. Watson Res. Center, Hawthorne, NY, USA
  • fYear
    2012
  • fDate
    25-30 March 2012
  • Firstpage
    2586
  • Lastpage
    2590
  • Abstract
    For MapReduce/Hadoop, map and reduce phases exhibit fundamentally distinguishing characteristics. Additionally, these two phases admit complicated and tight dependency on each other, causing the repeatedly observed starvation problem with the widely used Fair Scheduler. To mitigate this problem, we design Coupling Scheduler, which, among other new features, jointly schedules map and reduce tasks by coupling their progresses, different from existing ones that treat them separately. This design is based on the intuition that allocating excess resources to reduce tasks without balancing with the map task progress of the same job is likely to result in resource underutilization since a job is deemed done only when both phases complete. In order to analytically understand the performance of this design, we propose a model that captures the fundamental scheduling characteristics for MapReduce. Specifically, the map phase is modeled by a processor sharing queue, and the reduce phase by a “sticky processor sharing” queue. Along with the important dependence between these two types of tasks, we show that, for a class of jobs with regularly varying map service times, the job processing time distribution under Coupling Scheduler can be one order better than Fair Scheduler. These theoretical results are validated through simulations and the improved performance is further illustrated through real experiments on our testbed.
  • Keywords
    parallel processing; performance evaluation; processor scheduling; public domain software; queueing theory; resource allocation; Hadoop; MapReduce; coupling scheduler; excess resource allocation; fundamental scheduling characteristics; job processing time distribution; map phase; map scheduling characteristics; map service times; performance analysis; resource underutilization; starvation problem; sticky processor sharing queue; task reduction; Bismuth; Couplings; Delay; Indexes; Processor scheduling; Upper bound;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    INFOCOM, 2012 Proceedings IEEE
  • Conference_Location
    Orlando, FL
  • ISSN
    0743-166X
  • Print_ISBN
    978-1-4673-0773-4
  • Type

    conf

  • DOI
    10.1109/INFCOM.2012.6195658
  • Filename
    6195658