A Case for a Flexible Scalar Unit in SIMT Architecture

Author

Yi Yang ; Ping Xiang ; Mantor, Michael ; Rubin, Norman ; Hsu, L. ; Qunfeng Dong ; Huiyang Zhou

Author_Institution

Dept. of CSA, NEC Labs., Princeton, NJ, USA

fYear

2014

fDate

19-23 May 2014

Firstpage

93

Lastpage

102

Abstract

The wide availability and the Single-Instruction Multiple-Thread (SIMT)-style programming model have made graphics processing units (GPUs) a promising choice for high performance computing. However, because of the SIMT style processing, an instruction will be executed in every thread even if the operands are identical for all the threads. To overcome this inefficiency, the AMD´s latest Graphics Core Next (GCN) architecture integrates a scalar unit into a SIMT unit. In GCN, both the SIMT unit and the scalar unit share a single SIMT style instruction stream. Depending on its type, an instruction is issued to either a scalar or a SIMT unit. In this paper, we propose to extend the scalar unit so that it can either share the instruction stream with the SIMT unit or execute a separate instruction stream. The program to be executed by the scalar unit is referred to as a scalar program and its purpose is to assist SIMT-unit execution. The scalar programs are either generated from SIMT programs automatically by the compiler or manually developed by expert developers. We make a case for our proposed flexible scalar unit through three collaborative execution paradigms: data prefetching, control divergence elimination, and scalar-workload extraction. Our experimental results show that significant performance gains can be achieved using our proposed approaches compared to the state-of-art SIMT style processing.

Keywords

graphics processing units; instruction sets; multi-threading; multiprocessing systems; program compilers; storage management; AMD; GCN architecture; GPUs; SIMT architecture; SIMT style processing; SIMT-unit execution; collaborative execution paradigms; compiler; control divergence elimination; data prefetching; flexible scalar unit; graphics core next architecture; graphics processing units; high performance computing; scalar program; scalar-workload extraction; single SIMT style instruction stream; single-instruction multiple-thread-style programming model; Abstracts; Distributed processing; Vectors; GPGPU; SIMT; Vector unit; scalar unit;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium, 2014 IEEE 28th International

Conference_Location

Phoenix, AZ

ISSN

1530-2075

Print_ISBN

978-1-4799-3799-8

Type

conf

DOI

10.1109/IPDPS.2014.21

Filename

6877245