A Coprocessor Sharing-Aware Scheduler for Xeon Phi-Based Compute Clusters

Author

Coviello, Giuseppe ; Cadambi, Srihari ; Chakradhar, Srimat

Author_Institution

NEC Labs. America, Inc., Princeton, NJ, USA

fYear

2014

fDate

19-23 May 2014

Firstpage

337

Lastpage

346

Abstract

We propose a cluster scheduling technique for compute clusters with Xeon Phi coprocessors. Even though the Xeon Phi runs Linux which allows multiprocessing, cluster schedulers generally do not allow jobs to share coprocessors because sharing can cause oversubscription of coprocessor memory and thread resources. It has been shown that memory or thread oversubscription on a many core like the Phi results in job crashes or drastic performance loss. We first show that such an exclusive device allocation policy causes severe coprocessor underutilization: for typical workloads, on average only 38% of the Xeon Phi cores are busy across the cluster. Then, to improve coprocessor utilization, we propose a scheduling technique that enables safe coprocessor sharing without resource oversubscription. Jobs specify their maximum memory and thread requirements, and our scheduler packs as many jobs as possible on each coprocessor in the cluster, subject to resource limits. We solve this problem using a greedy approach at the cluster level combined with a knapsack-based algorithm for each node. Every coprocessor is modeled as a knapsack and jobs are packed into each knapsack with the goal of maximizing job concurrency, i.e., as many jobs as possible executing on each coprocessor. Given a set of jobs, we show that this strategy of packing for high concurrency is a good proxy for (i) reducing make span, without the need for users to specify job execution times and (ii) reducing coprocessor footprint, or the number of coprocessors required to finish the jobs without increasing make span. We implement the entire system as a seamless add on to Condor, a popular distributed job scheduler, and show make span and footprint reductions of more than 50% across a wide range of workloads.

Keywords

coprocessors; greedy algorithms; multiprocessing systems; pattern clustering; processor scheduling; Condor; Linux; Xeon Phi-based compute clusters; cluster scheduling technique; coprocessor footprint reduction; coprocessor memory oversubscription; coprocessor sharing-aware scheduler; coprocessor underutilization; coprocessor utilization; distributed job scheduler; exclusive device allocation policy; greedy approach; job concurrency maximization; knapsack-based algorithm; multiprocessing; performance loss; thread oversubscription; thread resources; Concurrent computing; Coprocessors; Hardware; Instruction sets; Linux; Memory management; Servers; Middleware; coprocessors; high performance computing; processor scheduling;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium, 2014 IEEE 28th International

Conference_Location

Phoenix, AZ

ISSN

1530-2075

Print_ISBN

978-1-4799-3799-8

Type

conf

DOI

10.1109/IPDPS.2014.44

Filename

6877268