DocumentCode
3663935
Title
The Load Slice Core microarchitecture
Author
Trevor E. Carlson;Wim Heirman;Osman Allam;Stefanos Kaxiras;Lieven Eeckhout
Author_Institution
Uppsala University, Sweden
fYear
2015
fDate
6/1/2015 12:00:00 AM
Firstpage
272
Lastpage
284
Abstract
Driven by the motivation to expose instruction-level parallelism (ILP), microprocessor cores have evolved from simple, in-order pipelines into complex, superscalar out-of-order designs. By extracting ILP, these processors also enable parallel cache and memory operations as a useful side-effect. Today, however, the growing off-chip memory wall and complex cache hierarchies of many-core processors make cache and memory accesses ever more costly. This increases the importance of extracting memory hierarchy parallelism (MHP), while reducing the net impact of more general, yet complex and power-hungry ILP-extraction techniques. In addition, for multi-core processors operating in power- and energy-constrained environments, energy-efficiency has largely replaced single-thread performance as the primary concern. Based on this observation, we propose a core microarchitecture that is aimed squarely at generating parallel accesses to the memory hierarchy while maximizing energy efficiency. The Load Slice Core extends the efficient in-order, stall-on-use core with a second in-order pipeline that enables memory accesses and address-generating instructions to bypass stalled instructions in the main pipeline. Backward program slices containing address-generating instructions leading up to loads and stores are extracted automatically by the hardware, using a novel iterative algorithm that requires no software support or recompilation. On average, the Load Slice Core improves performance over a baseline in-order processor by 53% with overheads of only 15% in area and 22% in power, leading to an increase in energy efficiency (MIPS/Watt) over in-order and out-of-order designs by 43% and over 4.7×, respectively. In addition, for a power- and area-constrained many-core design, the Load Slice Core outperforms both in-order and out-of-order designs, achieving a 53% and 95% higher performance, respectively, thus providing an alternative direction for future many-core processors.
Keywords
"Registers","Out of order","Parallel processing","Random access memory","Radio frequency","Microarchitecture"
Publisher
ieee
Conference_Titel
Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on
Type
conf
DOI
10.1145/2749469.2750407
Filename
7284072
Link To Document