مرکز منطقه ای اطلاع رساني علوم و فناوري - NOC-Out: Microarchitecting a Scale-Out Processor

DocumentCode :

1882322

Title :

NOC-Out: Microarchitecting a Scale-Out Processor

Author :

Lotfi-Kamran, Pejman ; Grot, Boris ; Falsafi, Babak

fYear :

2012

fDate :

1-5 Dec. 2012

Firstpage :

177

Lastpage :

187

Abstract :

Scale-out server workloads benefit from many-core processor organizations that enable high throughput thanks to abundant request-level parallelism. A key characteristic of these workloads is the large instruction footprint that exceeds the capacity of private caches. While a shared last-level cache (LLC) can capture the instruction working set, it necessitates a low-latency interconnect fabric to minimize the core stall time on instruction fetches serviced by the LLC. Many-core processors with a mesh interconnect sacrifice performance on scale-out workloads due to NOC-induced delays. Low-diameter topologies can overcome the performance limitations of meshes through rich inter-node connectivity, but at a high area expense. To address the drawbacks of existing designs, this work introduces NOC-Out - a many-core processor organization that affords low LLC access delays at a small area cost. NOC-Out is tuned to accommodate the bilateral core-to-cache access pattern, characterized by minimal coherence activity and lack of inter-core communication, that is dominant in scale-out workloads. Optimizing for the bilateral access pattern, NOC-Out segregates cores and LLC banks into distinct network regions and reduces costly network connectivity by eliminating the majority of inter-core links. NOC-Out further simplifies the interconnect through the use of low-complexity tree-based topologies. A detailed evaluation targeting a 64-core CMP and a set of scale-out workloads reveals that NOC-Out improves system performance by 17% and reduces network area by 28% over a tiled mesh-based design. Compared to a design with a richly-connected flattened butterfly topology, NOC-Out reduces network area by 9× while matching the performance.

Keywords :

cache storage; instruction sets; network topology; network-on-chip; LLC access delays; LLC banks; NOC-Out; NOC-induced delays; bilateral access pattern; bilateral core-to-cache access pattern; coherence activity; core stall time; instruction working set; intercore communication; intercore links; internode connectivity; low-complexity tree-based topologies; low-diameter topologies; low-latency interconnect fabric; many-core processor organizations; mesh interconnect; network connectivity; private cache capacity; request-level parallelism; richly-connected flattened butterfly topology; scale-out processor; scale-out server workloads; shared LLC; shared last-level cache; tiled mesh-based design;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Microarchitecture (MICRO), 2012 45th Annual IEEE/ACM International Symposium on

Conference_Location :

Vancouver, BC

ISSN :

1072-4451

Print_ISBN :

978-1-4673-4819-5

Type :

conf

DOI :

10.1109/MICRO.2012.25

Filename :

6493618

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1882322