DocumentCode
2925232
Title
Novel micro-threading techniques on the Cell Broadband Engine
Author
Ahmed, Mohamed F. ; Ammar, Reda A. ; Rajasekaran, Sanguthevar
Author_Institution
Dept. of Comput. Sci. & Eng., Univ. of Connecticut, Storrs, CT, USA
fYear
2009
fDate
5-8 July 2009
Firstpage
570
Lastpage
575
Abstract
The Cell Broadband Engine (CBE) is a heterogeneous multi-core processor with unique design properties for high-performance computing. It consists of one power processing element (PPE) and eight synergistic processing elements (SPEs) connected with the elements interconnect network (EIB). It employs some novel techniques, such as software managed cache, to hide memory latency and guarantees, by default, maximum utilization for the overall system resources. However, utilization of these facilities requires complex designs and implementations of algorithms to get the best performance. In this paper we discuss our micro-threading model realized by a nano-kernel implemented on top of each SPE. SPE´s Nano-kernel, or SPENK, employs the micro-threading model to increase CBE resources utilization while simplifying the programming model. Our framework boosted the processor´s overall performance by a factor of five compared to the current threading model. It allowed us to build a distributed model for SPEs´ tasks management and automated local storage (LS) management. We tested our framework on two types of algorithms: (1) uniform memory access algorithms, such as parallel summation, and (2) Non-uniform or irregular memory access algorithms, specifically the parallel tree spanning algorithm. We have also investigated the optimal parameterization of micro-threads on each SPE to automatically reach the best possible performance. Using proper parameterization of micro-threads, we could achieve three to fivefold performance improvement.
Keywords
microprocessor chips; operating system kernels; parallel machines; resource allocation; storage management; CBE resource utilization; Cell Broadband Engine; SPENK; automated local storage management; distributed model; elements interconnect network; heterogeneous multicore processor; high-performance computing; irregular memory access algorithm; maximum overall system resource utilization; memory latency; microthreading technique; nanokernel; nonuniform memory access algorithm; parallel summation; parallel tree spanning algorithm; power processing element; software managed cache; synergistic processing element; task management; uniform memory access algorithm; Algorithm design and analysis; Delay; Engines; Memory management; Multicore processing; Power system interconnection; Power system management; Process design; Resource management; Storage automation;
fLanguage
English
Publisher
ieee
Conference_Titel
Computers and Communications, 2009. ISCC 2009. IEEE Symposium on
Conference_Location
Sousse
ISSN
1530-1346
Print_ISBN
978-1-4244-4672-8
Electronic_ISBN
1530-1346
Type
conf
DOI
10.1109/ISCC.2009.5202256
Filename
5202256
Link To Document