DocumentCode
244418
Title
Extreme-Scale Viability of Collective Communication for Resilient Task Scheduling and Work Stealing
Author
Wilke, Joachim ; Bennett, Jonathan ; Kolla, Hemanth ; Teranishi, K. ; Slattengren, Nicole ; Floren, John
Author_Institution
Sandia Nat. Labs., Scalable Modeling & Anal., Livermore, CA, USA
fYear
2014
fDate
23-26 June 2014
Firstpage
756
Lastpage
761
Abstract
Extreme-scale computing will bring significant changes to high performance computing system architectures. In particular, the increased number of system components is creating a need for software to demonstrate "pervasive parallelism" and resiliency. Asynchronous, many-task programming models show promise in addressing both the scalability and resiliency challenges, however, they introduce an enormously challenging distributed, resilient consistency problem. In this work, we explore the viability of resilient collective communication in task scheduling and work stealing and, through simulation with SST/macro, the performance of these collectives on speculative extreme-scale architectures.
Keywords
object-oriented programming; parallel programming; scheduling; software architecture; software prototyping; SST/macro; distributed resilient consistency problem; extreme-scale architectures; extreme-scale computing; extreme-scale viability; high performance computing system architectures; many-task programming models; pervasive parallelism; resilient collective communication viability; resilient task scheduling; software resiliency; system components; work stealing; Analytical models; Bandwidth; Parallel processing; Resilience; Scalability; Three-dimensional displays; Topology; asynchronous programming models; fault tolerant collectives; structural simulation;
fLanguage
English
Publisher
ieee
Conference_Titel
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location
Atlanta, GA
Type
conf
DOI
10.1109/DSN.2014.105
Filename
6903637
Link To Document