• DocumentCode
    168667
  • Title

    MIMP: Deadline and Interference Aware Scheduling of Hadoop Virtual Machines

  • Author

    Wei Zhang ; Rajasekaran, Sanguthevar ; Wood, Tim ; Mingfa Zhu

  • Author_Institution
    George Washington Univ., Washington, DC, USA
  • fYear
    2014
  • fDate
    26-29 May 2014
  • Firstpage
    394
  • Lastpage
    403
  • Abstract
    Virtualization promised to dramatically increase server utilization levels, yet many data centers are still only lightly loaded. In some ways, big data applications are an ideal fit for using this residual capacity to perform meaningful work, but the high level of interference between interactive and batch processing workloads currently prevents this from being a practical solution in virtualized environments. Further, the variable nature of spare capacity may make it difficult to meet big data application deadlines. In this work we propose two schedulers: one in the virtualization layer designed to minimize interference on high priority interactive services, and one in the Hadoop framework that helps batch processing jobs meet their own performance deadlines. Our approach uses performance models to match Hadoop tasks to the servers that will benefit them the most, and deadline-aware scheduling to effectively order incoming jobs. The combination of these schedulers allows data center administrators to safely mix resource intensive Hadoop jobs with latency sensitive web applications, and still achieve predictable performance for both. We have implemented our system using Xen and Hadoop, and our evaluation shows that our schedulers allow a mixed cluster to reduce web response times by more than ten fold, while meeting more Hadoop deadlines and lowering total task execution times by 6.5%.
  • Keywords
    Big Data; Internet; batch processing (computers); computer centres; file servers; scheduling; virtual machines; virtualisation; Hadoop deadlines; Hadoop framework; Hadoop virtual machines; MIMP; Web response times; Xen; batch processing jobs; batch processing workloads; big data application deadlines; data centers; deadline-aware scheduling; high priority interactive services; interactive processing workloads; interference aware scheduling; interference minimization; latency sensitive Web applications; performance deadlines; resource intensive Hadoop jobs; server utilization levels; task execution times; virtualization layer; Batch production systems; Interference; Random access memory; Servers; Time factors; Virtual machining; Virtualization; Map Reduce; deadlines; interference; scheduling; virtualization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on
  • Conference_Location
    Chicago, IL
  • Type

    conf

  • DOI
    10.1109/CCGrid.2014.101
  • Filename
    6846475