DocumentCode
177354
Title
STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies
Author
Venkatesan, R. ; Ramasubramanian, Shankar Ganesh ; Venkataramani, Swagath ; Roy, Kaushik ; Raghunathan, Anand
Author_Institution
Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
fYear
2014
fDate
14-18 June 2014
Firstpage
253
Lastpage
264
Abstract
General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds to thousands of cores that current GPGPUs integrate places great demands on the memory hierarchy, fueling an ever-increasing demand for on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using a new spintronic memory technology called Domain Wall Memory (DWM). DWMs inherently offer unprecedented benefits in density by storing multiple bits in the domains of a ferromagnetic nanowire, which logically resembles a bit-serial tape. However, this structure also leads to a unique challenge that the bits must be sequentially accessed by performing “shift” operations, resulting in variable and potentially higher access latencies. To address this challenge, STAG utilizes a number of architectural techniques : (i) a hybrid cache organization that employs different DWM bit-cells to realize the different memory arrays within the GPGPU cache hierarchy, (ii) a clustered, bit-interleaved organization, in which the bits in a cache block are spread across a cluster of DWM tapes, allowing parallel access, (iii) tape head management policies that predictively configure DWM arrays to reduce the expected number of shift operations for subsequent accesses, and (iv) a shift aware promotion buffer (SaPB), in which accesses to the DWM cache are predicted based on intra-warp locality, and locations that would incur a large shift penalty are promoted to a smaller buffer. Over a wide range of benchmarks from the Rodinia, ISPASS and Parboil suites, STAG achieves significant benefits in performance (12.1% over SRAM and 5.8% over STT-MRAM) and energy (3.3X over SRAM and 2.6X over STT-MRAM).
Keywords
cache storage; ferromagnetic materials; graphics processing units; magnetoelectronics; nanowires; DWM; GPGPU cache hierarchies; STAG; cache organization; domain wall memory; ferromagnetic nanowire; general-purpose graphics processing units; memory hierarchy; on-chip memory; spintronic-tape architecture; Arrays; Magnetic domains; Magnetic tunneling; Organizations; Random access memory; Transistors; Wires;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
Conference_Location
Minneapolis, MN
Print_ISBN
978-1-4799-4396-8
Type
conf
DOI
10.1109/ISCA.2014.6853233
Filename
6853233
Link To Document