DocumentCode :
166200
Title :
GPU-Qin: A methodology for evaluating the error resilience of GPGPU applications
Author :
Bo Fang ; Pattabiraman, Karthik ; Ripeanu, Matei ; Gurumurthi, Sudhanva
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of British Columbia, Vancouver, BC, Canada
fYear :
2014
fDate :
23-25 March 2014
Firstpage :
221
Lastpage :
230
Abstract :
While graphics processing units (GPUs) have gained wide adoption as accelerators for general-purpose applications (GPGPU), the end-to-end reliability implications of their use have not been quantified. Fault injection is a widely used method for evaluating the reliability of applications. However, building a fault injector for GPGPU applications is challenging due to their massive parallelism, which makes it difficult to achieve representativeness while being time-efficient. This paper makes three key contributions. First, it presents the design of a fault-injection methodology to evaluate end-to-end reliability properties of application kernels running on GPUs. Second, it introduces a fault-injection tool that uses real GPU hardware and offers a good balance between the representativeness and the efficiency of the fault injection experiments. Third, this paper characterizes the error resilience characteristics of twelve GPGPU applications.
Keywords :
fault tolerant computing; graphics processing units; GPGPU applications; GPU-Qin methodology; end-to-end reliability implications; error resilience evaluation; fault-injection methodology; general-purpose graphics processing units; Graphics processing units; Hardware; Instruction sets; Parallel processing; Registers; Resilience; Transient analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Performance Analysis of Systems and Software (ISPASS), 2014 IEEE International Symposium on
Conference_Location :
Monterey, CA
Print_ISBN :
978-1-4799-3604-5
Type :
conf
DOI :
10.1109/ISPASS.2014.6844486
Filename :
6844486
Link To Document :
بازگشت