مرکز منطقه ای اطلاع رساني علوم و فناوري - Exploiting criticality to reduce bottlenecks in distributed uniprocessors

DocumentCode :

2947401

Title :

Exploiting criticality to reduce bottlenecks in distributed uniprocessors

Author :

Robatmili, Behnam ; Govindan, Sibi ; Burger, Doug ; Keckler, Stephen W.

Author_Institution :

Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA

fYear :

2011

fDate :

12-16 Feb. 2011

Firstpage :

431

Lastpage :

442

Abstract :

Composable multicore systems merge multiple independent cores for running sequential single-threaded workloads. The performance scalability of these systems, however, is limited due to partitioning overheads. This paper addresses two of the key performance scalability limitations of composable multicore systems. We present a critical path analysis revealing that communication needed for cross-core register value delivery and fetch stalls due to misspeculation are the two worst bottlenecks that prevent efficient scaling to a large number of fused cores. To alleviate these bottlenecks, this paper proposes a fully distributed framework to exploit criticality in these architectures at different granularities. A coordinator core exploits different types of block-level communication criticality information to fine-tune critical instructions at decode and register forward pipeline stages of their executing cores. The framework exploits the fetch criticality information at a coarser granularity by reissuing all instructions in the blocks previously fetched into the merged cores. This general framework reduces competing bottlenecks in a synergic manner and achieves scalable performance/power efficiency for sequential programs when running across a large number of cores.

Keywords :

microprocessor chips; multiprocessing systems; performance evaluation; pipeline processing; block level communication criticality information; coarse granularity; composable multicore system; critical path analysis; cross core register value delivery; distributed uniprocessor; fetch criticality information; fetch stalls; fine tune critical instruction; misspeculation; partitioning overhead; performance scalability limitation; sequential single threaded workload; Bandwidth; Benchmark testing; Hardware; Microarchitecture; Multicore processing; Pipelines; Registers;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on

Conference_Location :

San Antonio, TX

ISSN :

1530-0897

Print_ISBN :

978-1-4244-9432-3

Type :

conf

DOI :

10.1109/HPCA.2011.5749749

Filename :

5749749

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2947401