DocumentCode :
10561
Title :
Microarchitecture of a Coarse-Grain Out-of-Order Superscalar Processor
Author :
Capalija, Davor ; Abdelrahman, Tarek S.
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Toronto, Toronto, ON, Canada
Volume :
24
Issue :
2
fYear :
2013
fDate :
Feb. 2013
Firstpage :
392
Lastpage :
405
Abstract :
We explore the design, implementation, and evaluation of a coarse-grain superscalar processor in the context of the microarchitecture of the Control Processor (CP) of the Multilevel Computing Architecture (MLCA), a novel architecture targeted for multimedia multicore systems. The MLCA augments a traditional multicore architecture (called the lower level) with a CP (called the top-level), which automatically extracts parallelism among coarse-grain units of computation (tasks), synchronizes these tasks and schedules them for execution on processors. It does so in a fashion similar to how instruction-level parallelism is extracted by superscalar processors, i.e., using register renaming, Out-of-Order Execution (OoOE) and scheduling. The coarse-grain nature of tasks imposes challenging constraints on the direct use of these techniques, but also offers opportunities for simpler designs. We analyze the impact of these constraints and opportunities and present novel microarchitectural mechanisms for coarse-grain superscalar execution, including register renaming, task queue, dynamic out-of-order scheduling and task-issue. We design an MLCA system around our CP microarchitecture and implement it on an FPGA. We evaluate the system using multimedia applications and show good scalability for eight processors, limited by the memory bandwidth of the FPGA platform. Furthermore, we show that the CP introduces little overhead in terms of resource usage. Finally, we show scalability beyond eight processors using cycle-accurate RTL-level simulation with an idealized memory subsystem. We demonstrate that the CP poses no performance bottlenecks and is scalable up to 32 processors.
Keywords :
field programmable gate arrays; multimedia systems; multiprocessing systems; parallel processing; scheduling; CP microarchitecture; FPGA; MLCA system; OoOE; coarse-grain out-of-order superscalar processor microarchitecture; coarse-grain superscalar execution; control processor; cycle-accurate RTL-level simulation; dynamic out-of-order scheduling; idealized memory subsystem; instruction-level parallelism; lower level; multilevel computing architecture; multimedia multicore systems; out-of-order execution; register renaming; resource usage; task queue; task-issue; top-level; Clocks; Dynamic scheduling; Microarchitecture; Parallel processing; Programming; Registers; Throughput; Coarse-grain parallelism; out-of-order execution; register renaming; task-level superscalar execution;
fLanguage :
English
Journal_Title :
Parallel and Distributed Systems, IEEE Transactions on
Publisher :
ieee
ISSN :
1045-9219
Type :
jour
DOI :
10.1109/TPDS.2012.135
Filename :
6193098
Link To Document :
بازگشت