DocumentCode
703864
Title
Soft-error reliability and power co-optimization for GPGPUs register file using resistive memory
Author
Jingweijia Tan ; Zhi Li ; Xin Fu
Author_Institution
ECE Dept., Univ. of Houston, Houston, TX, USA
fYear
2015
fDate
9-13 March 2015
Firstpage
369
Lastpage
374
Abstract
The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g. spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this paper, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance loss. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 60% energy savings with negligible (e.g. 4%) performance loss.
Keywords
energy conservation; graphics processing units; integrated circuit reliability; power aware computing; radiation hardening (electronics); GPGPU register file; energy savings; energy-efficient soft-error protection mechanism; general-purpose computing; graphics processing units; high-performance computing; leverage resistive memory; low leakage power consumption; magnetic field based storage; power cooptimization; power stress; registers soft-error vulnerability; resistive memory; soft-error protection; soft-error reliability; Benchmark testing; Error correction codes; Instruction sets; Radio frequency; Random access memory; Registers; Reliability; Energy Efficiency; GPGPU; Register File; Reliability; Resistive Memory; Soft Error;
fLanguage
English
Publisher
ieee
Conference_Titel
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
Conference_Location
Grenoble
Print_ISBN
978-3-9815-3704-8
Type
conf
Filename
7092416
Link To Document