مرکز منطقه ای اطلاع رساني علوم و فناوري - Breaking the on-chip latency barrier using SMART

DocumentCode :

602617

Title :

Breaking the on-chip latency barrier using SMART

Author :

Krishna, Tushar ; Chen, Chia-Hsin Owen ; Woo Cheol Kwon ; Li-Shiuan Peh

Author_Institution :

Comput. Sci. & Artificial Intell. Lab. (CSAIL), Massachusetts Inst. of Technol., Cambridge, MA, USA

fYear :

2013

fDate :

23-27 Feb. 2013

Firstpage :

378

Lastpage :

389

Abstract :

As the number of on-chip cores increases, scalable on-chip topologies such as meshes inevitably add multiple hops in each network traversal. The best we can do right now is to design 1-cycle routers, such that the low-load network latency between a source and destination is equal to the number of routers + links (i.e. hops×2) between them. OS/compiler and cache coherence protocols designers often try to limit communication to within a few hops, since on-chip latency is critical for their scalability. In this work, we propose an on-chip network called SMART (Single-cycle Multi-hop Asynchronous Repeated Traversal) that aims to present a single-cycle data-path all the way from the source to the destination. We do not add any additional fast physical express links in the data-path; instead we drive the shared crossbars and links asynchronously up to multiple-hops within a single cycle. We design a router + link microarchitecture to achieve such a traversal, and a flow-control technique to arbitrate and setup multi-hop paths within a cycle. A place-and-routed design at 45nm achieves 11 hops within a 1GHz cycle for paths without turns (9 for paths with turns). We observe 5-8X reduction in low-load latencies across synthetic traffic patterns on an 8×8 CMP, compared to a baseline 1-cycle router. Full-system simulations with SPLASH-2 and PAR-SEC benchmarks demonstrate 27/52% and 20/59% reduction in runtime and EDP for Private/Shared L2 designs.

Keywords :

cache storage; computer architecture; microprocessor chips; network routing; program compilers; 1-cycle routers; CMP; EDP; OS-compiler; PARSEC benchmarks; SMART; SPLASH-2 benchmarks; cache coherence protocols designers; low-load network latency; on-chip cores; on-chip latency barrier; place-and-routed design; private-shared L2 designs; router+link microarchitecture; scalable on-chip topologies; shared crossbars; single-cycle multihop asynchronous repeated traversal; Delays; Pipelines; Repeaters; Runtime; Switches; System-on-chip; Wires;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA2013), 2013 IEEE 19th International Symposium on

Conference_Location :

Shenzhen

ISSN :

1530-0897

Print_ISBN :

978-1-4673-5585-8

Type :

conf

DOI :

10.1109/HPCA.2013.6522334

Filename :

6522334

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=602617