DocumentCode :
2635970
Title :
Performance Analysis of Cell Broadband Engine for High Memory Bandwidth Applications
Author :
Jiménez-Gonzalez, Daniel ; Martorell, Xavier ; Ramírez, Alex
Author_Institution :
Dept. of Comput. Archit., Univ. Politecnica de Catalunya, Barcelona
fYear :
2007
fDate :
25-27 April 2007
Firstpage :
210
Lastpage :
219
Abstract :
The cell broadband engine (CBE) is designed to be a general purpose platform exposing an enormous arithmetic performance due to its eight SIMD-only synergistic processor elements (SPEs), capable of achieving 134.4 GFLOPS (16.8 GFLOPS * 8) at 2.1 GHz, and a 64-bit power processor element (PPE). Each SPE has a 256Kb non-coherent local memory, and communicates to other SPEs and main memory through its DMA controller. CBE main memory is connected to all the CBE processor elements (PPE and SPEs) through the element interconnect bus (EIB), which has a 134.4 GB/s bandwidth performance peak at half the processor speed. Therefore, CBE platform is suitable to be used by applications using MPI and streaming programming models with a potential high performance peak. In this paper we focus on the communication part of those applications, and measure the actual memory bandwidth that each of the CBE processor components can sustain. We have measured the sustained bandwidth between PPE and memory, SPE and memory, two individual SPEs to determine if this bandwidth depends on their physical location, pairs of SPEs to achieve maximum bandwidth in nearly-ideal conditions, and in a cycle of SPEs representing a streaming kind of computation. Our results on a real machine show that following some strict programming rules, individual SPE to SPE communication almost achieves the peak bandwidth when using the DMA controllers to transfer memory chunks of at least 1024 Bytes. In addition, SPE to memory bandwidth should be considered in streaming programming. For instance, implementing two data streams using 4 SPEs each can be more efficient than having a single data stream using the 8 SPEs
Keywords :
parallel processing; storage management; DMA controller; arithmetic performance analysis; bandwidth performance peak; cell broadband engine; data stream; direct memory access; element interconnect bus; memory bandwidth application; message passing interface; processor component; processor speed; single instruction multiple data; streaming programming model; synergistic processor element; Application software; Bandwidth; Computer architecture; Electronic mail; Engines; Instruction sets; Microwave integrated circuits; Performance analysis; Random access memory; Registers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance Analysis of Systems & Software, 2007. ISPASS 2007. IEEE International Symposium on
Conference_Location :
San Jose, CA
Print_ISBN :
1-4244-1082-7
Electronic_ISBN :
1-4244-1082-7
Type :
conf
DOI :
10.1109/ISPASS.2007.363751
Filename :
4211037
Link To Document :
بازگشت