DocumentCode :
3611903
Title :
Memory Access Time and Input Size Effects on Parallel Processors Reliability
Author :
Pilla, Laercio L. ; Oliveira, Daniel A. G. ; Lunardi, Caio ; Navaux, Philippe O. A. ; Carro, Luigi ; Rech, Paolo
Author_Institution :
Dept. de Inf. e Estatistica, Fed. Univ. of Santa Catarina, Florianiópolis, Brazil
Volume :
62
Issue :
6
fYear :
2015
Firstpage :
2627
Lastpage :
2634
Abstract :
In this paper, we evaluate the effects of reducing the average memory access time (AMAT) on graphics processing units´ (GPU) performance and reliability based on data obtained at Los Alamos Neutron Science Center (LANSCE). We also measure the effects of input size changes on the neutron radiation sensitivity of the GPU running different applications. Results show an increase in the silent data corruption (SDC) cross section with AMAT optimizations from a higher usage of unprotected registers and SRAM memory resources, and an increase in the single event functional interruption (SEFI) cross section of applications that did not saturate the scheduling resources of the GPU. Based on the execution time changes and cross section increases reported, we extend the reliability analysis of parallel processors by proposing the mean workload between failures (MWBF) metric to evaluate the amount of data correctly computed before experiencing a failure. The use of optimizations leads to more stable MWBF values that indicate a better reliability with respect to nonoptimized codes when processing large inputs.
Keywords :
SRAM chips; fault tolerant computing; flip-flops; graphics processing units; parallel processing; parallelising compilers; storage management; system recovery; AMAT optimization; GPU performance; GPU reliability; GPU scheduling resource; LANSCE; Los Alamos Neutron Science Center; MWBF metric; SDC cross section; SEFI cross section; SRAM memory resources; average memory access time reduction; execution time changes; graphics processing units; input size effect; mean workload between failures metric; neutron radiation sensitivity; nonoptimized codes; parallel processor reliability; reliability analysis; silent data corruption; single event functional interruption; unprotected registers; Fault tolerance; Graphics processing units; Neutron radiation effects; Parallel processing; Reliability; Fault-tolerance; graphics processing unit (GPU); neutron sensitivity; parallel processors; reliability;
fLanguage :
English
Journal_Title :
Nuclear Science, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9499
Type :
jour
DOI :
10.1109/TNS.2015.2496381
Filename :
7348809
Link To Document :
بازگشت