• DocumentCode
    3118520
  • Title

    Experimental evaluation of GPUs radiation sensitivity and algorithm-based fault tolerance efficiency

  • Author

    Rech, P. ; Carro, Luigi

  • Author_Institution
    Univ. Fed. do Rio Grande do Sul Porto Alegre, Porto Alegre, Brazil
  • fYear
    2013
  • fDate
    8-10 July 2013
  • Firstpage
    244
  • Lastpage
    247
  • Abstract
    Experimental results demonstrate that Graphic Processing Units are very prone to be corrupted by neutrons. We have performed several experimental campaigns at ISIS, UK and at LANSCE, Los Alamos, NM, USA accessing the sensitivity of the GPU internal resources as well as the error rate of common parallel algorithms. Experiments highlight output error patterns and radiation responses that can be fruitfully used to design optimized Algorithm-Based Fault Tolerance strategies and provide pragmatic programming guidelines to increase the code reliability with low computational overhead.
  • Keywords
    computational complexity; error correction codes; fault tolerant computing; graphics processing units; parallel algorithms; performance evaluation; radiation effects; GPU internal resource sensitivity; GPU radiation sensitivity; ISIS UK; LANSCE Los Alamos NM USA; algorithm-based fault tolerance efficiency; code reliability; computational overhead; design optimized algorithm-based fault tolerance strategies; error rate; experimental evaluation; graphic processing units; output error patterns; parallel algorithms; radiation responses; Error correction codes; Graphics processing units; Instruction sets; Neutrons; Parallel processing; Reliability; Sensitivity; GPU; multiple errors; neutron sensitivity; software-based hardening;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    On-Line Testing Symposium (IOLTS), 2013 IEEE 19th International
  • Conference_Location
    Chania
  • Type

    conf

  • DOI
    10.1109/IOLTS.2013.6604091
  • Filename
    6604091