Title :
Extreme-Scale Viability of Collective Communication for Resilient Task Scheduling and Work Stealing
Author :
Wilke, Joachim ; Bennett, Jonathan ; Kolla, Hemanth ; Teranishi, K. ; Slattengren, Nicole ; Floren, John
Author_Institution :
Sandia Nat. Labs., Scalable Modeling & Anal., Livermore, CA, USA
Abstract :
Extreme-scale computing will bring significant changes to high performance computing system architectures. In particular, the increased number of system components is creating a need for software to demonstrate "pervasive parallelism" and resiliency. Asynchronous, many-task programming models show promise in addressing both the scalability and resiliency challenges, however, they introduce an enormously challenging distributed, resilient consistency problem. In this work, we explore the viability of resilient collective communication in task scheduling and work stealing and, through simulation with SST/macro, the performance of these collectives on speculative extreme-scale architectures.
Keywords :
object-oriented programming; parallel programming; scheduling; software architecture; software prototyping; SST/macro; distributed resilient consistency problem; extreme-scale architectures; extreme-scale computing; extreme-scale viability; high performance computing system architectures; many-task programming models; pervasive parallelism; resilient collective communication viability; resilient task scheduling; software resiliency; system components; work stealing; Analytical models; Bandwidth; Parallel processing; Resilience; Scalability; Three-dimensional displays; Topology; asynchronous programming models; fault tolerant collectives; structural simulation;
Conference_Titel :
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location :
Atlanta, GA
DOI :
10.1109/DSN.2014.105