How to implement effective prediction and forwarding for fusable dynamic multicore architectures

Author

Robatmili, B. ; Dong Li ; Esmaeilzadeh, H. ; Govindan, S. ; Smith, A. ; Putnam, A. ; Burger, Danilo ; Keckler, Stephen W.

Author_Institution

Qualcomm Res. Silicon Valley, CA, USA

fYear

2013

fDate

23-27 Feb. 2013

Firstpage

460

Lastpage

471

Abstract

Dynamic multicore architectures, that fuse and split cores at run time, potentially offer a level of performance/energy agility that static multicore designs cannot achieve. Conventional ISAs, however, have scalability limits to fusion. EDGE-based designs offer greater scalability but to date have been performance limited by significant microarchitectural bottlenecks. This paper addresses these issues and makes three major contributions. First, it proposes Iterative Path Prediction to address low next block prediction accuracy and low speculation rates. It achieves close to taken/not-taken prediction accuracy for multi-exit instruction blocks while also speculating the predicated execution path within the block. Second, the paper proposes Exposed Operand Broadcasts to address the overhead of operand delivery for high fanout instructions by exposing a small number of broadcast operands in the ISA. Third, we present a scalable composable architecture called T3 that uses these mechanisms and show it can operate across a wide range of power and performance spectrum by increasing energy efficiency and performance significantly. Compared to previous EDGE designs, T3 improves energy efficiency by about 2x and performance by up to 50%.

Keywords

energy conservation; instruction sets; integrated circuit design; iterative methods; multiprocessing systems; performance evaluation; power aware computing; EDGE-based designs; ISAs; T3 architecture; broadcast operands; core fusion; core splitting; energy efficiency; execution path; exposed operand broadcasts; fusable dynamic multicore architectures; iterative path prediction; low speculation rates; microarchitectural bottlenecks; multiexit instruction blocks; operand delivery; performance spectrum; performance-energy agility; power spectrum; prediction accuracy; scalable composable architecture; static multicore designs; Accuracy; History; Microarchitecture; Multicore processing; Out of order; Registers;

fLanguage

English

Publisher

ieee

Conference_Titel

High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on

Conference_Location

Shenzhen

ISSN

1530-0897

Print_ISBN

978-1-4673-5585-8

Type

conf

DOI

10.1109/HPCA.2013.6522341

Filename

6522341