• DocumentCode
    3256186
  • Title

    Task-Cloning Algorithms in a MapReduce Cluster with Competitive Performance Bounds

  • Author

    Huanle Xu ; Wing Cheong Lau

  • Author_Institution
    Dept. of Inf. Eng., Chines Univ. of Hong Kong, Hong Kong, China
  • fYear
    2015
  • fDate
    June 29 2015-July 2 2015
  • Firstpage
    339
  • Lastpage
    348
  • Abstract
    Job scheduling for a MapReduce cluster has been an active research topic in recent years. However, measurement traces from real-world production environment show that the duration of tasks within a job vary widely. The overall elapsed time of a job, i.e. The so-called flow time, is often dictated by one or few slowly-running tasks within a job, generally referred as the "stragglers". The cause of stragglers include tasks running on partially/intermittently failing machines or the existence of some localized resource bottleneck(s) within a MapReduce cluster. To tackle this online job scheduling challenge, we adopt the task cloning approach and design the corresponding scheduling algorithms which aim at minimizing the weighted sum of job flow times in a MapReduce cluster based on the Shortest Remaining Processing Time scheduler (SRPT). To be more specific, we first design a 2-competitive offline algorithm when the variance of task-duration is negligible. We then extend this offline algorithm to yield the so-called SRPTMS+C algorithm for the online case and show that SRPTMS+C is (1 + ϵ) - speed o (1/ϵ2) - competitive in reducing the weighted sum of job flow times within a cluster. Both of the algorithms explicitly consider the precedence constraints between the two phases within the MapReduce framework. We also demonstrate via trace-driven simulations that SRPTMS+C can significantly reduce the weighted/unweighted sum of job flow times by cutting down the elapsed time of small jobs substantially. In particular, SRPTMS+C beats the Microsoft Mantri scheme by nearly 25% according to this metric.
  • Keywords
    data handling; parallel processing; pattern clustering; MapReduce cluster; Microsoft Mantri scheme; SRPTMS+C algorithm; competitive performance bounds; failing machines; localized resource bottleneck; offline algorithm; online job scheduling challenge; real-world production environment; shortest remaining processing time scheduler; stragglers; task-cloning algorithms; Conferences; Distributed computing; MapReduce; SRPT; cloning; competitive bound; job Scheduling; weighted job flowtime;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Distributed Computing Systems (ICDCS), 2015 IEEE 35th International Conference on
  • Conference_Location
    Columbus, OH
  • ISSN
    1063-6927
  • Type

    conf

  • DOI
    10.1109/ICDCS.2015.42
  • Filename
    7164920