DocumentCode
3118520
Title
Experimental evaluation of GPUs radiation sensitivity and algorithm-based fault tolerance efficiency
Author
Rech, P. ; Carro, Luigi
Author_Institution
Univ. Fed. do Rio Grande do Sul Porto Alegre, Porto Alegre, Brazil
fYear
2013
fDate
8-10 July 2013
Firstpage
244
Lastpage
247
Abstract
Experimental results demonstrate that Graphic Processing Units are very prone to be corrupted by neutrons. We have performed several experimental campaigns at ISIS, UK and at LANSCE, Los Alamos, NM, USA accessing the sensitivity of the GPU internal resources as well as the error rate of common parallel algorithms. Experiments highlight output error patterns and radiation responses that can be fruitfully used to design optimized Algorithm-Based Fault Tolerance strategies and provide pragmatic programming guidelines to increase the code reliability with low computational overhead.
Keywords
computational complexity; error correction codes; fault tolerant computing; graphics processing units; parallel algorithms; performance evaluation; radiation effects; GPU internal resource sensitivity; GPU radiation sensitivity; ISIS UK; LANSCE Los Alamos NM USA; algorithm-based fault tolerance efficiency; code reliability; computational overhead; design optimized algorithm-based fault tolerance strategies; error rate; experimental evaluation; graphic processing units; output error patterns; parallel algorithms; radiation responses; Error correction codes; Graphics processing units; Instruction sets; Neutrons; Parallel processing; Reliability; Sensitivity; GPU; multiple errors; neutron sensitivity; software-based hardening;
fLanguage
English
Publisher
ieee
Conference_Titel
On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International
Conference_Location
Chania
Type
conf
DOI
10.1109/IOLTS.2013.6604091
Filename
6604091
Link To Document