Title :
A Case for a Flexible Scalar Unit in SIMT Architecture
Author :
Yi Yang ; Ping Xiang ; Mantor, Michael ; Rubin, Norman ; Hsu, L. ; Qunfeng Dong ; Huiyang Zhou
Author_Institution :
Dept. of CSA, NEC Labs., Princeton, NJ, USA
Abstract :
The wide availability and the Single-Instruction Multiple-Thread (SIMT)-style programming model have made graphics processing units (GPUs) a promising choice for high performance computing. However, because of the SIMT style processing, an instruction will be executed in every thread even if the operands are identical for all the threads. To overcome this inefficiency, the AMD´s latest Graphics Core Next (GCN) architecture integrates a scalar unit into a SIMT unit. In GCN, both the SIMT unit and the scalar unit share a single SIMT style instruction stream. Depending on its type, an instruction is issued to either a scalar or a SIMT unit. In this paper, we propose to extend the scalar unit so that it can either share the instruction stream with the SIMT unit or execute a separate instruction stream. The program to be executed by the scalar unit is referred to as a scalar program and its purpose is to assist SIMT-unit execution. The scalar programs are either generated from SIMT programs automatically by the compiler or manually developed by expert developers. We make a case for our proposed flexible scalar unit through three collaborative execution paradigms: data prefetching, control divergence elimination, and scalar-workload extraction. Our experimental results show that significant performance gains can be achieved using our proposed approaches compared to the state-of-art SIMT style processing.
Keywords :
graphics processing units; instruction sets; multi-threading; multiprocessing systems; program compilers; storage management; AMD; GCN architecture; GPUs; SIMT architecture; SIMT style processing; SIMT-unit execution; collaborative execution paradigms; compiler; control divergence elimination; data prefetching; flexible scalar unit; graphics core next architecture; graphics processing units; high performance computing; scalar program; scalar-workload extraction; single SIMT style instruction stream; single-instruction multiple-thread-style programming model; Abstracts; Distributed processing; Vectors; GPGPU; SIMT; Vector unit; scalar unit;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-3799-8
DOI :
10.1109/IPDPS.2014.21