• DocumentCode
    1783201
  • Title

    A Case for a Flexible Scalar Unit in SIMT Architecture

  • Author

    Yi Yang ; Ping Xiang ; Mantor, Michael ; Rubin, Norman ; Hsu, L. ; Qunfeng Dong ; Huiyang Zhou

  • Author_Institution
    Dept. of CSA, NEC Labs., Princeton, NJ, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    93
  • Lastpage
    102
  • Abstract
    The wide availability and the Single-Instruction Multiple-Thread (SIMT)-style programming model have made graphics processing units (GPUs) a promising choice for high performance computing. However, because of the SIMT style processing, an instruction will be executed in every thread even if the operands are identical for all the threads. To overcome this inefficiency, the AMD´s latest Graphics Core Next (GCN) architecture integrates a scalar unit into a SIMT unit. In GCN, both the SIMT unit and the scalar unit share a single SIMT style instruction stream. Depending on its type, an instruction is issued to either a scalar or a SIMT unit. In this paper, we propose to extend the scalar unit so that it can either share the instruction stream with the SIMT unit or execute a separate instruction stream. The program to be executed by the scalar unit is referred to as a scalar program and its purpose is to assist SIMT-unit execution. The scalar programs are either generated from SIMT programs automatically by the compiler or manually developed by expert developers. We make a case for our proposed flexible scalar unit through three collaborative execution paradigms: data prefetching, control divergence elimination, and scalar-workload extraction. Our experimental results show that significant performance gains can be achieved using our proposed approaches compared to the state-of-art SIMT style processing.
  • Keywords
    graphics processing units; instruction sets; multi-threading; multiprocessing systems; program compilers; storage management; AMD; GCN architecture; GPUs; SIMT architecture; SIMT style processing; SIMT-unit execution; collaborative execution paradigms; compiler; control divergence elimination; data prefetching; flexible scalar unit; graphics core next architecture; graphics processing units; high performance computing; scalar program; scalar-workload extraction; single SIMT style instruction stream; single-instruction multiple-thread-style programming model; Abstracts; Distributed processing; Vectors; GPGPU; SIMT; Vector unit; scalar unit;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.21
  • Filename
    6877245