DocumentCode
2720935
Title
Efficient complex matrix multiplication on the Synergistic Processing Element of the Cell processor
Author
Bourgerie, Quentin ; Fortin, Pierre ; Lamotte, Jean-Luc
Author_Institution
Univ. Pierre et Marie Curie, Paris, France
fYear
2010
fDate
20-24 Sept. 2010
Firstpage
1
Lastpage
8
Abstract
In order to implement a complete Fast Multipole Method on the Cell processor, we need an efficient complex matrix multiplication on each Synergistic Processing Element (SPE) of the Cell processor. Since the last IBM SDK does not provide such routine, we build our own one in single precision with C programming. We show that the complex matrix multiplication requires a specific computation scheme for the micro-kernel running on the SPE, and that a 32×32 tile is appropriate for close to peak performance computation as well as for communication overlapping. Our micro-kernel delivers 23.74 Gflop/s, which is 92.7% of the SPE peak performance, and we obtain up to 23.65 Gflop/s for one complete complex matrix product on one SPE, and up to 378.36 Gflop/s for 16 products on 16 SPEs.
Keywords
C language; matrix multiplication; multiprocessing systems; C programming; Cell processor; complex matrix multiplication; fast multipole method; microkernel; synergistic processing element; Blades; Computer architecture; Laplace equations; Microprocessors; Pipelines; Tiles; CGEMM; Cell processor; Fast Multipole Method; SPE; complex matrix multiplication;
fLanguage
English
Publisher
ieee
Conference_Titel
Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on
Conference_Location
Heraklion, Crete
Print_ISBN
978-1-4244-8395-2
Electronic_ISBN
978-1-4244-8397-6
Type
conf
DOI
10.1109/CLUSTERWKSP.2010.5613077
Filename
5613077
Link To Document