DocumentCode
1661609
Title
Flexible coscheduling: mitigating load imbalance and improving utilization of heterogeneous resources
Author
Frachtenberg, Eitan ; Feitelson, Dror G. ; Petrini, Fabrizio ; Fernandez, Juan
Author_Institution
Comput. & Computational Sci. Div., Los Alamos Nat. Lab., NM, USA
fYear
2003
Abstract
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with perfect efficiency. Obviously, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job´s granularity and communication activity, and using gang scheduling only for those jobs that require it. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully implemented as part of the STORM resource manager, and shown to be competitive with gang scheduling and implicit coscheduling.
Keywords
parallel architectures; resource allocation; workstation clusters; STORM resource manager; cluster computing; communication activity; fine-grained parallel applications; flexible coscheduling; gang scheduling; hardware heterogeneity; heterogeneous clusters; heterogeneous resources utilization; job scheduling; load balancing; load imbalance; parallel architectures; space slicing; time slicing; Application software; Concurrent computing; Grid computing; Hardware; Informatics; Laboratories; Processor scheduling; Resource management; Storms; Yarn;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2003. Proceedings. International
ISSN
1530-2075
Print_ISBN
0-7695-1926-1
Type
conf
DOI
10.1109/IPDPS.2003.1213191
Filename
1213191
Link To Document