DocumentCode
1999101
Title
High Performance Adaptive Distributed Scheduling Algorithm
Author
Narang, Arun ; Srivastava, Anurag ; Shyamasundar, R.K.
Author_Institution
IBM India Res. Lab., New Delhi, India
fYear
2013
fDate
20-24 May 2013
Firstpage
1725
Lastpage
1734
Abstract
Exascale computing requires complex runtime systems that need to consider affinity, load balancing and low time and message complexity for scheduling massive scale parallel computations. Simultaneous consideration of these objectives makes online distributed scheduling a very challenging problem. Prior distributed scheduling approaches are limited to shared memory or primarily use work-stealing across distributed memory nodes for load-balancing or depend on the programmer specified affinity. However, the performance of affinity driven scheduling and work stealing based algorithms degrades when the input is irregular(UTS) and/or sparse. In this paper we present a novel adaptive distributed scheduling algorithm (ALDS) for multi-place parallel computations, that uses a unique combination of remote (inter-place) spawns and remote work steals to reduce the overheads in the scheduler, which helps to dynamically maintain load balance across the compute nodes of the system, while ensuring affinity maximally. Using parallel machine learning algorithms such as Support Vector Regression running concurrently with program execution on the target architecture, ALDS can automatically and adaptively tune the parameters for scalable performance. Our design was implemented using GASNet API and POSIX threads. For the UTS (Unbalanced Tree Search) benchmark (using up to 2048 nodes of Blue Gene/P), we deliver superior performance over Charm++ [1] and [2].
Keywords
application program interfaces; communication complexity; concurrency control; learning (artificial intelligence); parallel algorithms; regression analysis; resource allocation; scheduling; support vector machines; tree searching; ALDS; Blue Gene/P nodes; GASNet API; POSIX threads; UTS benchmark; affinity; automatic-adaptive parameter tuning; complex runtime systems; concurrency; dynamic load balance maintenance; exascale computing; high-performance adaptive distributed scheduling algorithm; load balancing; massive-scale parallel computation scheduling; message complexity; multiplace parallel computations; online distributed scheduling; overhead reduction; parallel machine learning algorithms; program execution; remote interplace spawns; support vector regression; target architecture; time complexity; unbalanced tree search benchmark; Load management; Machine learning algorithms; Message systems; Scheduling; Scheduling algorithms; Support vector machines; Adaptive Scheduling; Distributed Scheduling; Performance Analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location
Cambridge, MA
Print_ISBN
978-0-7695-4979-8
Type
conf
DOI
10.1109/IPDPSW.2013.232
Filename
6651071
Link To Document