DocumentCode
1783201
Title
A Case for a Flexible Scalar Unit in SIMT Architecture
Author
Yi Yang ; Ping Xiang ; Mantor, Michael ; Rubin, Norman ; Hsu, L. ; Qunfeng Dong ; Huiyang Zhou
Author_Institution
Dept. of CSA, NEC Labs., Princeton, NJ, USA
fYear
2014
fDate
19-23 May 2014
Firstpage
93
Lastpage
102
Abstract
The wide availability and the Single-Instruction Multiple-Thread (SIMT)-style programming model have made graphics processing units (GPUs) a promising choice for high performance computing. However, because of the SIMT style processing, an instruction will be executed in every thread even if the operands are identical for all the threads. To overcome this inefficiency, the AMD´s latest Graphics Core Next (GCN) architecture integrates a scalar unit into a SIMT unit. In GCN, both the SIMT unit and the scalar unit share a single SIMT style instruction stream. Depending on its type, an instruction is issued to either a scalar or a SIMT unit. In this paper, we propose to extend the scalar unit so that it can either share the instruction stream with the SIMT unit or execute a separate instruction stream. The program to be executed by the scalar unit is referred to as a scalar program and its purpose is to assist SIMT-unit execution. The scalar programs are either generated from SIMT programs automatically by the compiler or manually developed by expert developers. We make a case for our proposed flexible scalar unit through three collaborative execution paradigms: data prefetching, control divergence elimination, and scalar-workload extraction. Our experimental results show that significant performance gains can be achieved using our proposed approaches compared to the state-of-art SIMT style processing.
Keywords
graphics processing units; instruction sets; multi-threading; multiprocessing systems; program compilers; storage management; AMD; GCN architecture; GPUs; SIMT architecture; SIMT style processing; SIMT-unit execution; collaborative execution paradigms; compiler; control divergence elimination; data prefetching; flexible scalar unit; graphics core next architecture; graphics processing units; high performance computing; scalar program; scalar-workload extraction; single SIMT style instruction stream; single-instruction multiple-thread-style programming model; Abstracts; Distributed processing; Vectors; GPGPU; SIMT; Vector unit; scalar unit;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location
Phoenix, AZ
ISSN
1530-2075
Print_ISBN
978-1-4799-3799-8
Type
conf
DOI
10.1109/IPDPS.2014.21
Filename
6877245
Link To Document