• DocumentCode
    157828
  • Title

    Precision-aware soft error protection for GPUs

  • Author

    Palframan, David J. ; Nam Sung Kim ; Lipasti, Mikko H.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Univ. of Wisconsin - Madison, Madison, WI, USA
  • fYear
    2014
  • fDate
    15-19 Feb. 2014
  • Firstpage
    49
  • Lastpage
    59
  • Abstract
    With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.
  • Keywords
    encoding; floating point arithmetic; graphics processing units; GPU execution logic; GPU register file; GPU soft error protection; error coverage optimization; floating-point-intensive GPGPU applications; general-purpose GPU computing; integer computations; intelligent register file encoding; mean error magnitude reduction; precision-aware soft error protection; selective logic hardening; selective protection approach; targeted checker circuits; Computer architecture; Error correction codes; Graphics processing units; Logic gates; Registers; Timing; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on
  • Conference_Location
    Orlando, FL
  • Type

    conf

  • DOI
    10.1109/HPCA.2014.6835966
  • Filename
    6835966