• DocumentCode
    177354
  • Title

    STAG: Spintronic-Tape Architecture for GPGPU cache hierarchies

  • Author

    Venkatesan, R. ; Ramasubramanian, Shankar Ganesh ; Venkataramani, Swagath ; Roy, Kaushik ; Raghunathan, Anand

  • Author_Institution
    Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA
  • fYear
    2014
  • fDate
    14-18 June 2014
  • Firstpage
    253
  • Lastpage
    264
  • Abstract
    General-purpose Graphics Processing Units (GPGPUs) are widely used for executing massively parallel workloads from various application domains. Feeding data to the hundreds to thousands of cores that current GPGPUs integrate places great demands on the memory hierarchy, fueling an ever-increasing demand for on-chip memory. In this work, we propose STAG, a high density, energy-efficient GPGPU cache hierarchy design using a new spintronic memory technology called Domain Wall Memory (DWM). DWMs inherently offer unprecedented benefits in density by storing multiple bits in the domains of a ferromagnetic nanowire, which logically resembles a bit-serial tape. However, this structure also leads to a unique challenge that the bits must be sequentially accessed by performing “shift” operations, resulting in variable and potentially higher access latencies. To address this challenge, STAG utilizes a number of architectural techniques : (i) a hybrid cache organization that employs different DWM bit-cells to realize the different memory arrays within the GPGPU cache hierarchy, (ii) a clustered, bit-interleaved organization, in which the bits in a cache block are spread across a cluster of DWM tapes, allowing parallel access, (iii) tape head management policies that predictively configure DWM arrays to reduce the expected number of shift operations for subsequent accesses, and (iv) a shift aware promotion buffer (SaPB), in which accesses to the DWM cache are predicted based on intra-warp locality, and locations that would incur a large shift penalty are promoted to a smaller buffer. Over a wide range of benchmarks from the Rodinia, ISPASS and Parboil suites, STAG achieves significant benefits in performance (12.1% over SRAM and 5.8% over STT-MRAM) and energy (3.3X over SRAM and 2.6X over STT-MRAM).
  • Keywords
    cache storage; ferromagnetic materials; graphics processing units; magnetoelectronics; nanowires; DWM; GPGPU cache hierarchies; STAG; cache organization; domain wall memory; ferromagnetic nanowire; general-purpose graphics processing units; memory hierarchy; on-chip memory; spintronic-tape architecture; Arrays; Magnetic domains; Magnetic tunneling; Organizations; Random access memory; Transistors; Wires;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4799-4396-8
  • Type

    conf

  • DOI
    10.1109/ISCA.2014.6853233
  • Filename
    6853233