Title :
Phaser accumulators: A new reduction construct for dynamic parallelism
Author :
Shirako, J. ; Peixotto, D.M. ; Sarkar, V. ; Scherer, W.N., III
Author_Institution :
Dept. of Comput. Sci., Rice Univ., Houston, TX, USA
Abstract :
A reduction is a computation in which a common operation, such as a sum, is to be performed across multiple pieces of data, each supplied by a separate task. We introduce phaser accumulators, a new reduction construct that meshes seamlessly with phasers to support dynamic parallelism in a phased (iterative) setting. By separating reduction computations into the parts of sending data, performing the computation itself, and retrieving the result, we enable overlap of communication and computation in a manner analogous to that of split-phase barriers. Additionally, this separation enables exploration of implementation strategies that differ as to when the reduction itself is performed: eagerly when the data is supplied, or lazily when a synchronization point is reached. We implement accumulators as extensions to phasers in the Habanero dialect of the X10 programming language. Performance evaluations of the EPCC Syncbench, Spectral-norm, and CG benchmarks on AMD Opteron, Intel Xeon, and Sun UltraSPARC T2 multicore SMPs show superior performance and scalability over OpenMP reductions (on two platforms) and X10 code (on three platforms) written with atomic blocks, with improvements of up to 2.5times on the Opteron and 14.9times on the UltraSPARC T2 relative to OpenMP and 16.5times on the Opteron, 26.3times on the Xeon and 94.8times on the UltraSPARC T2 relative to X10 atomic blocks. To the best of our knowledge, no prior reduction construct supports the dynamic parallelism and asynchronous capabilities of phaser accumulators.
Keywords :
data reduction; parallel processing; AMD Opteron; CG benchmarks; EPCC Syncbench; Intel Xeon; OpenMP reduction; Spectral norm; Sun UltraSPARC T2 multicore SMP; X10 code; X10 programming language; computation reduction; dynamic parallelism; performance evaluation; phased iterative setting; phaser accumulator; reduction construct; split-phase barriers; synchronization point; Character generation; Computer languages; Computer science; Concurrent computing; Multicore processing; Parallel processing; Parallel programming; Programming profession; Scalability; Sun;
Conference_Titel :
Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on
Conference_Location :
Rome
Print_ISBN :
978-1-4244-3751-1
Electronic_ISBN :
1530-2075
DOI :
10.1109/IPDPS.2009.5161071