Title :
Real-world design and evaluation of compiler-managed GPU redundant multithreading
Author :
Wadden, Jack ; Lyashevsky, Alexander ; Gurumurthi, Sudhanva ; Sridharan, Vilas ; Skadron, Kevin
Author_Institution :
Univ. of Virginia, Charlottesville, VA, USA
Abstract :
Reliability for general purpose processing on the GPU (GPGPU) is becoming a weak link in the construction of reliable supercomputer systems. Because hardware protection is expensive to develop, requires dedicated on-chip resources, and is not portable across different architectures, the efficiency of software solutions such as redundant multithreading (RMT) must be explored. This paper presents a real-world design and evaluation of automatic software RMT on GPU hardware. We first describe a compiler pass that automatically converts GPGPU kernels into redundantly threaded versions. We then perform detailed power and performance evaluations of three RMT algorithms, each of which provides fault coverage to a set of structures in the GPU. Using real hardware, we show that compiler-managed software RMT has highly variable costs. We further analyze the individual costs of redundant work scheduling, redundant computation, and inter-thread communication, showing that no single component in general is responsible for high overheads across all applications; instead, certain workload properties tend to cause RMT to perform well or poorly. Finally, we demonstrate the benefit of architectural support for RMT with a specific example of fast, register-level thread communication.
Keywords :
graphics processing units; multi-threading; parallel machines; performance evaluation; power aware computing; program compilers; redundancy; GPGPU kernels; GPU hardware; automatic software RMT; compiler-managed GPU redundant multithreading; interthread communication; performance evaluation; power evaluation; real-world design; real-world evaluation; redundant work scheduling; redundantly computation; redundantly threaded kernels; register-level thread communication; Computer architecture; Graphics processing units; Hardware; Kernel; Registers; Reliability;
Conference_Titel :
Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on
Conference_Location :
Minneapolis, MN
Print_ISBN :
978-1-4799-4396-8
DOI :
10.1109/ISCA.2014.6853227