Title :
Apple-CORE: Microgrids of SVP Cores -- Flexible, General-Purpose, Fine-Grained Hardware Concurrency Management
Author :
Poss, Raphael ; Lankamp, Mike ; Yang, Qiang ; Fu, Jian ; Van Tol, Michiel W. ; Jesshope, Chris
Author_Institution :
Inst. for Inf., Univ. of Amsterdam, Amsterdam, Netherlands
Abstract :
To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency management across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. The corresponding hardware implementation provides logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional "accelerator" approach, Microgrids are intended to be used as components in distributed systems on chip that consider both clusters of small cores and optional larger cores optimized towards sequential performance as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate operations with irregular long latencies, a scale-invariant programming model, a distributed vision of the chip\´s structure, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This paper describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. The reference chip parameters include 128 cores, a 4 MB on-chip distributed cache network and four DDR3-1600 memory channels. This paper presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilization of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.
Keywords :
binary codes; cache storage; computer power supplies; concurrency control; data flow computing; distributed memory systems; distributed power generation; energy conservation; memory architecture; multi-threading; power engineering computing; power grids; reduced instruction set computing; Apple-CORE project; CMP; DDR3-1600 memory channels; SVP interface; accelerator approach; chip multiprocessor; computation cluster on chip; concurrency control interface; concurrency management; core microarchitecture; cryptographic kernels; dataflow synchronisation; distributed systems on chip; energy efficiency; general machine model; general purpose computer; general purpose processor chip; hardware implementation; irregular long latency; manycore; memory access; microgrid; multithreaded RISC core; on-chip distributed cache network; optimization; parallel programming; program binary code; scale invariant programming model; sequential performance; transparent performance scaling; Concurrent computing; Context; Hardware; Instruction sets; Registers; Synchronization; concurrency; hardware multithreading; many-core; microthreads; multi-core; parallel programming;
Conference_Titel :
Digital System Design (DSD), 2012 15th Euromicro Conference on
Conference_Location :
Izmir
Print_ISBN :
978-1-4673-2498-4
DOI :
10.1109/DSD.2012.25