DocumentCode :
703864
Title :
Soft-error reliability and power co-optimization for GPGPUs register file using resistive memory
Author :
Jingweijia Tan ; Zhi Li ; Xin Fu
Author_Institution :
ECE Dept., Univ. of Houston, Houston, TX, USA
fYear :
2015
fDate :
9-13 March 2015
Firstpage :
369
Lastpage :
374
Abstract :
The increasing adoption of graphics processing units (GPUs) for high-performance computing raises the reliability challenge, which is generally ignored in traditional GPUs. GPUs usually support thousands of parallel threads and require a sizable register file. Such large register file is highly susceptible to soft errors and power-hungry. Although ECC has been adopted to register file in modern GPUs, it causes considerable power overhead, which further increases the power stress. Thus, an energy-efficient soft-error protection mechanism is more desirable. Besides its extremely low leakage power consumption, resistive memory (e.g. spin-transfer torque RAM) is also immune to the radiation induced soft errors due to its magnetic field based storage. In this paper, we propose to LEverage reSistive memory to enhance the Soft-error robustness and reduce the power consumption (LESS) of registers in the General-Purpose computing on GPUs (GPGPUs). Since resistive memory experiences longer write latency compared to SRAM, we explore the unique characteristics of GPGPU applications to obtain the win-win gains: achieving the near-full soft-error protection for the register file, and meanwhile substantially reducing the energy consumption with negligible performance loss. Our experimental results show that LESS is able to mitigate the registers soft-error vulnerability by 86% and achieve 60% energy savings with negligible (e.g. 4%) performance loss.
Keywords :
energy conservation; graphics processing units; integrated circuit reliability; power aware computing; radiation hardening (electronics); GPGPU register file; energy savings; energy-efficient soft-error protection mechanism; general-purpose computing; graphics processing units; high-performance computing; leverage resistive memory; low leakage power consumption; magnetic field based storage; power cooptimization; power stress; registers soft-error vulnerability; resistive memory; soft-error protection; soft-error reliability; Benchmark testing; Error correction codes; Instruction sets; Radio frequency; Random access memory; Registers; Reliability; Energy Efficiency; GPGPU; Register File; Reliability; Resistive Memory; Soft Error;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015
Conference_Location :
Grenoble
Print_ISBN :
978-3-9815-3704-8
Type :
conf
Filename :
7092416
Link To Document :
بازگشت