Title :
Programmable and Scalable Reductions on Clusters
Author :
Ciesko, Jan ; Bueno, J. ; Puzovic, Nikola ; Ramirez, Adrian ; Badia, R.M. ; Labarta, Jesus
Author_Institution :
Barcelona Supercomput. Center, Barcelona, Spain
Abstract :
Reductions matter and they are here to stay. Wide adoption of parallel processing hardware in a broad range of computer applications has encouraged recent research efforts on their efficient parallelization. Furthermore, trends towards high productivity languages in mainstream computing increases the demand for efficient programming support. In this paper we present a new approach on parallel reductions for distributed memory systems that provides both scalability and programmability. Using OmpSs, a task-based parallel programming model, the developer has the ability to express scalable reductions through a single pragma annotation. This pragma annotation is applicable for tasks as well as for work-sharing constructs (with implicit tasking) and instructs the compiler to generate the required runtime calls. The supporting runtime handles data and task distribution, parallel execution and data reduction. Scalability is achieved through a software cache that maximizes local and temporal data reuse and allows overlapped computation and communication. Results confirm scalability for up to 32 12-core cluster nodes.
Keywords :
cache storage; data reduction; distributed memory systems; parallel programming; program compilers; task analysis; 12-core cluster nodes; OmpSs; compiler; computer applications; data reduction; distributed memory systems; local data reuse maximization; mainstream computing; parallel execution; parallel processing hardware; parallel reduction; pragma annotation; programmable reduction; runtime call generation; runtime data handling; scalable reduction; software cache; task distribution; task-based parallel programming model; temporal data reuse maximization; work-sharing; Arrays; Histograms; Reactive power; Runtime; Scalability; Software; Vectors; distributed systems; parallel programming; reductions; runtime systems; software cache;
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4673-6066-1
DOI :
10.1109/IPDPS.2013.63