Title :
On the Efficient Implementation of Reductions on the Cell Broadband Engine
Author_Institution :
Inst. of Comput. Sci., Univ. of Innsbruck, Innsbruck, Austria
Abstract :
For a high-performance parallel implementation of many scientific algorithms, efficient realizations of combining communication patterns like reduce or all-reduce are important. Especially on the Cell Broadband Engine a low latency realization of such operations is not obvious. So in this paper several algorithms for implementing reductions are discussed and efficient implementations on the Cell are proposed. Detailed performance results are presented for reductions of vectors of various sizes on a Cell blade consisting of two interconnected Cell processors. It is shown that the new reductions algorithms are in most cases faster than other previously published implementations.
Keywords :
computer architecture; multiprocessing systems; cell blade; cell broadband engine; communication patterns; interconnected cell processors; reductions algorithms; Bandwidth; Blades; Clocks; Clustering algorithms; Computer science; Concurrent computing; Distributed computing; Engines; Synchronization; Yarn; Cell Broadband Engine; Reductions;
Conference_Titel :
Parallel, Distributed and Network-Based Processing (PDP), 2010 18th Euromicro International Conference on
Conference_Location :
Pisa
Print_ISBN :
978-1-4244-5672-7
Electronic_ISBN :
1066-6192
DOI :
10.1109/PDP.2010.59