DocumentCode :
244347
Title :
Warped-Shield: Tolerating Hard Faults in GPGPUs
Author :
Dweik, Waleed ; Abdel-Majeed, M. ; Annavaram, Murali
Author_Institution :
Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA
fYear :
2014
fDate :
23-26 June 2014
Firstpage :
431
Lastpage :
442
Abstract :
Graphics processing units (GPUs) are rapidly becoming the parallel accelerators of choice to run general purpose applications. GPUs that run general purpose applications are termed as GPGPUs. Many mission-critical and long-running scientific application are being ported to run on GPGPUs. These applications demand strong computational integrity. GPGPUs, like many other digital components, face imminent reliability threats due to technology scaling. Of particular concern is the infield hard faults that are persistent and irreversible. GPGPUs comprise of dozens of streaming processors where each streaming processor employs tens of execution units, organized as single instruction multiple thread (SIMT) lanes to deliver massive parallel computational power. In this paper we exploit the massive replication of SIMT lanes to tolerate infield hard faults. First, we introduce thread shuffling to reroute threads, originally mapped to faulty SIMT lanes, to idle healthy lanes. Thread shuffling is insufficient when the number of healthy SIMT lanes is fewer than the number of active threads. To broaden the reach of thread shuffling, we propose dynamic warp deformation to split the warp into multiple sub-warps, each sub-warp uses fewer SIMT lanes thereby providing more opportunities to avoid using a faulty SIMT lane. Finally, we propose warp shuffling which exploits non-uniform degradation of different streaming processors by scheduling a warp to a streaming processor that requires fewer warp splits. Hence, warp shuffling helps to reduce the performance overhead associated with dynamic warp deformation. By deploying the proposed techniques, we can tolerate the worst case scenario of having up to three hard faults per four SIMT lane cluster with at most 36%performance degradation.
Keywords :
fault tolerant computing; graphics processing units; multi-threading; parallel processing; scheduling; GPGPUs; SIMT lanes; computational integrity; dynamic warp deformation; general purpose applications; graphics processing units; infield hard fault tolerance; long-running scientific application; mission-critical scientific application; parallel accelerators; parallel computational power; performance overhead reduction; single instruction multiple thread lanes; streaming processors; thread rerouting; thread shuffling; warp scheduling; warp shuffling; warped-shield; Benchmark testing; Fault tolerance; Fault tolerant systems; Instruction sets; Optimized production technology; Registers; Single instruction multiple threads (SIMT); thread shuffling; warp deformation; warp shuffling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP International Conference on
Conference_Location :
Atlanta, GA
Type :
conf
DOI :
10.1109/DSN.2014.95
Filename :
6903600
Link To Document :
بازگشت