DocumentCode :
1649041
Title :
Latency Hiding and Performance Tuning with Graph-Based Execution
Author :
Cicotti, Pietro ; Baden, Scott B.
Author_Institution :
San Diego Supercomput. Center, Univ. of California, San Diego, San Diego, CA, USA
fYear :
2011
Firstpage :
28
Lastpage :
37
Abstract :
In the current practice, scientific programmer and HPC users are required to develop code that exposes a high degree of parallelism, exhibits high locality, dynamically adapts to the available resources, and hides communication latency. Hiding communication latency is crucial to realize the potential of today´s distributed memory machines with highly parallel processing modules, and technological trends indicate that communication latencies will continue to be an issue as the performance gap between computation and communication widens. However, under Bulk Synchronous Parallel models, the predominant paradigm in scientific computing, scheduling is embedded into the application code. All the phases of a computation are defined and laid out as a linear sequence of operations limiting overlap and the program´s ability to adapt to communication delays. In this paper we present an alternative model, called Tarragon, to overcome the limitations of Bulk Synchronous Parallelism. Tarragon, which is based on dataflow, targets latency tolerant scientific computations. Tarragon supports a task-dependency graph abstraction in which tasks, the basic unit of computation, are organized as a graph according to their data dependencies, i.e. task precedence. In addition to the task graph, Tarragon supports metadata abstractions, annotations to the task graph, to express locality information and scheduling policies to improve performance. Tarragon´s functionality and underlying programming methodology are demonstrated on three classes of computations used in scientific domains: structured grids, sparse linear algebra, and dynamic programming. In the application studies, Tarragon implementations achieve high performance, in many cases exceeding the performance of equivalent latency-tolerant, hard coded MPI implementations.
Keywords :
application program interfaces; distributed memory systems; dynamic programming; graph theory; linear algebra; natural sciences computing; parallel processing; HPC users; Tarragon; bulk synchronous parallel models; bulk synchronous parallelism; communication latency; data dependencies; dataflow; distributed memory machines; dynamic programming; graph-based execution; hard coded MPI implementations; high parallelism degree; latency hiding; latency tolerant scientific computations; metadata abstractions; parallel processing modules; performance tuning; scientific programmer; sparse linear algebra; structured grids; task-dependency graph abstraction; Computational modeling; Computer architecture; Hardware; Object oriented modeling; Processor scheduling; Programming; dataflow; distributed computing; high performance computing; parallel computing; programming model; runtime; scientific computing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data-Flow Execution Models for Extreme Scale Computing (DFM), 2011 First Workshop on
Conference_Location :
Galveston Island, TX
Print_ISBN :
978-1-4673-0709-3
Type :
conf
DOI :
10.1109/DFM.2011.15
Filename :
6176402
Link To Document :
بازگشت