Title :
Portable petaFLOP/s programming: applying distributed computing methodology to the grid within a single machine room
Author :
Woodward, Paul R. ; Anderson, S.E.
Author_Institution :
Lab. of Comput. Sci. & Eng., Minnesota Univ., Minneapolis, MN, USA
Abstract :
According to today´s best projections, petaFLOP/s computing platforms will combine deep memory hierarchies in both latency and bandwidth with a need for many-thousand-fold parallelism. Unless effective parallel programs are prepared in advance, much of the promise of the first year or two of operation for these systems may be lost. We introduce a candidate for a portable petaFLOP/s programming model that can enable these important early application programs to be developed while, at the same time, permitting these same applications to run efficiently on the most capable computing systems now available. An MPI-based model is portable, but its programming paradigm ignores the potential benefits of hardware support for shared memory within each network node. A threads-based model cannot directly cope with the distributed nature of the memory over the network. Therefore, a new, portable programming model is needed. The shared memory programming model dramatically simplifies the expression of dynamic load balancing strategies for irregular algorithms. The main strategy is a transparent self-scheduled task list performed in parallel so long as specified data-dependent conditions are met. The model used is a cluster of multiprocessor distributed shared memory machines with network-attached disks. Our experimental run-time system allows the programmer to view this computing platform as a single machine with a four-stage memory hierarchy, consisting of coherent processor cache, non-coherent local shared memory, global shared memory, plus a global disk file system
Keywords :
distributed shared memory systems; parallel programming; resource allocation; software portability; 1 PFLOPS; application program development; coherent processor cache; data-dependent conditions; deep memory hierarchies; dynamic load balancing strategies; four-stage memory hierarchy; global disk file system; global shared memory; hardware support; irregular algorithms; machine room grid; memory bandwidth; memory latency; multiprocessor distributed shared memory machine cluster; network-attached disks; noncoherent local shared memory; parallel programs; portable petaFLOPS programming; run-time system; shared memory programming model; transparent self-scheduled task list; Bandwidth; Clustering algorithms; Concurrent computing; Delay; Distributed computing; Dynamic programming; Hardware; Load management; Parallel processing; Portable computers;
Conference_Titel :
High Performance Distributed Computing, 1999. Proceedings. The Eighth International Symposium on
Conference_Location :
Redondo Beach, CA
Print_ISBN :
0-7803-5681-0
DOI :
10.1109/HPDC.1999.805284