Efficient complex matrix multiplication on the Synergistic Processing Element of the Cell processor

Author

Bourgerie, Quentin ; Fortin, Pierre ; Lamotte, Jean-Luc

Author_Institution

Univ. Pierre et Marie Curie, Paris, France

fYear

2010

fDate

20-24 Sept. 2010

Firstpage

1

Lastpage

8

Abstract

In order to implement a complete Fast Multipole Method on the Cell processor, we need an efficient complex matrix multiplication on each Synergistic Processing Element (SPE) of the Cell processor. Since the last IBM SDK does not provide such routine, we build our own one in single precision with C programming. We show that the complex matrix multiplication requires a specific computation scheme for the micro-kernel running on the SPE, and that a 32×32 tile is appropriate for close to peak performance computation as well as for communication overlapping. Our micro-kernel delivers 23.74 Gflop/s, which is 92.7% of the SPE peak performance, and we obtain up to 23.65 Gflop/s for one complete complex matrix product on one SPE, and up to 378.36 Gflop/s for 16 products on 16 SPEs.

Keywords

C language; matrix multiplication; multiprocessing systems; C programming; Cell processor; complex matrix multiplication; fast multipole method; microkernel; synergistic processing element; Blades; Computer architecture; Laplace equations; Microprocessors; Pipelines; Tiles; CGEMM; Cell processor; Fast Multipole Method; SPE; complex matrix multiplication;

fLanguage

English

Publisher

ieee

Conference_Titel

Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS), 2010 IEEE International Conference on

Conference_Location

Heraklion, Crete

Print_ISBN

978-1-4244-8395-2

Electronic_ISBN

978-1-4244-8397-6

Type

conf

DOI

10.1109/CLUSTERWKSP.2010.5613077

Filename

5613077