• DocumentCode
    2794182
  • Title

    Analyzing soft-error vulnerability on GPGPU microarchitecture

  • Author

    Tan, Jingweijia ; Goswami, Nilanjan ; Li, Tao ; Fu, Xin

  • Author_Institution
    EECS Dept., Univ. of Kansas, Lawrence, KS, USA
  • fYear
    2011
  • fDate
    6-8 Nov. 2011
  • Firstpage
    226
  • Lastpage
    235
  • Abstract
    The general-purpose computation on graphic processing units (GPGPU) becomes increasingly popular due to their high computational throughput for data parallel applications. Modern GPU architectures have limited capability for error detection and tolerance since they are originally designed for graphics processing. However, the rigorous execution correctness is required for general-purpose applications. This makes reliability a growing concern in GPGPU architecture design. With CMOS processing technologies continuously scaling down to the nano-scale, on-chip soft error rate (SER) has been predicted to increase exponentially. GPGPUs with hundreds of cores integrated in a single chip are prone to manifest high SER. This paper explores a first step to characterize GPGPU reliability in light of soft errors. We develop GPGPU-SODA (GPGPU Software Dependability Analysis), a framework to estimate the soft-error vulnerability of GPGPU microarchitecture. By using GPGPU-SODA, we observe that several microarchitecture structures in GPGPUs exhibit high soft-error susceptibility, and the structure vulnerability is sensitive to workload characteristics (e.g. branch divergences, memory coalescing). We further investigate several architectural optimizations. We find that both dynamic warp formation and increasing the number of threads supported by GPU largely affect the GPGPU soft-error robustness. However, changing the warp scheduling policy has minor impact on the structure vulnerability. The observations made in this study provide designers the useful guidance to build resilient GPGPUs: a comprehensive resiliency solution for GPGPUs should consider the entire GPGPU design instead of just focusing on a particular structure.
  • Keywords
    CMOS integrated circuits; error statistics; graphics processing units; CMOS processing technology; GPGPU design; GPGPU microarchitecture; GPGPU reliability; GPGPU-SODA; architectural optimization; comprehensive resiliency; computational throughput; data parallel application; dynamic warp formation; error detection; general-purpose computation; graphic processing unit; graphics processing; soft error rate; soft-error vulnerability; software dependability analysis; structure vulnerability; warp scheduling policy; workload characteristics; Computer architecture; Graphics processing unit; Instruction sets; Microarchitecture; Registers; Reliability;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Workload Characterization (IISWC), 2011 IEEE International Symposium on
  • Conference_Location
    Austin, TX
  • Print_ISBN
    978-1-4577-2063-5
  • Electronic_ISBN
    978-1-4577-2062-8
  • Type

    conf

  • DOI
    10.1109/IISWC.2011.6114182
  • Filename
    6114182