DocumentCode
167385
Title
Prototyping the MBTAC Processor for the REPLICA CMP
Author
Forsell, Martti ; Roivainen, Jussi ; Leppanen, Ville
Author_Institution
VTT Tech. Res. Centre of Finland, Oulu, Finland
fYear
2014
fDate
19-23 May 2014
Firstpage
709
Lastpage
716
Abstract
Current chip multiprocessors (CMP) have mostly been designed by replicating sequential/single core processors and providing some support for operating them with a shared memory. As a result of this, they define asynchronous computational model of threads, often require maximizing the locality of memory references to get decent performance, and feature high intercommunication overheads, that make parallel programming tedious for general purpose functionalities. Most of these problems can be eliminated by designing the processors architecture for scalable general purpose computing from the very beginning like done in processors for configurable emulated shared memory (CESM) CMPs. They provide support for machine instruction-level synchronization, make use of multithreading to support latency-insensitive computation, and promote the concept of uniform synchronous shared memory for easy variable allocation and convenient data exchange. In our earlier work we have proposed the first CESM architecture TOTAL ECLIPSE composed of early MBTAC processors making use of very low-overhead multithreading, parallel computing savvy functional unit organization, support for fast synchronization between the instructions and threads, and highly efficient multioperations. Unfortunately, certain key parts of these processors turned out to be hardly implementable and overall they lacked support for ordered multiprefix operations and full configurability of the CESM scheme. In this paper we introduce a new fully configurable version of the MBTAC-processor for our new REPLICA CESM architecture and the first FPGA implementations of it. To evaluate it, we execute short test programs on it and compare it preliminary against Intel Core i7 and DLX processors. Our FPGA design flow and testing approach are described.
Keywords
field programmable gate arrays; microprocessor chips; multi-threading; parallel programming; shared memory systems; synchronisation; CESM CMP; CESM architecture TOTAL ECLIPSE; CESM scheme; DLX processors; FPGA design flow approach; FPGA implementations; FPGA testing approach; Intel Core i7 processors; MBTAC processor prototyping; MBTAC processors; REPLICA CESM architecture; REPLICA CMP; asynchronous thread computational model; chip multiprocessors; configurable emulated shared memory CMP; intercommunication overheads; latency-insensitive computation; low-overhead multithreading; machine instruction-level synchronization; memory references; multiprefix operations; parallel computing savvy functional unit organization; parallel programming; scalable general purpose computing; sequential core processors; single core processors; uniform synchronous shared memory; Field programmable gate arrays; Instruction sets; Memory management; Phase change random access memory; Prototypes; Synchronization; FPGA prototype; NUMA; PRAM; chaining; multithreded processor; parallel computing;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium Workshops (IPDPSW), 2014 IEEE International
Conference_Location
Phoenix, AZ
Print_ISBN
978-1-4799-4117-9
Type
conf
DOI
10.1109/IPDPSW.2014.82
Filename
6969452
Link To Document