مرکز منطقه ای اطلاع رساني علوم و فناوري - Improving GPGPU resource utilization through alternative thread block scheduling

DocumentCode :

157775

Title :

Improving GPGPU resource utilization through alternative thread block scheduling

Author :

Minseok Lee ; Seokwoo Song ; Joosik Moon ; Kim, Jung-Ho ; Woong Seo ; Yeongon Cho ; Soojung Ryu

Author_Institution :

KAIST, Daejeon, South Korea

fYear :

2014

fDate :

15-19 Feb. 2014

Firstpage :

260

Lastpage :

271

Abstract :

High performance in GPGPU workloads is obtained by maximizing parallelism and fully utilizing the available resources. The thousands of threads are assigned to each core in units of CTA (Cooperative Thread Arrays) or thread blocks - with each thread block consisting of multiple warps or wavefronts. The scheduling of the threads can have significant impact on overall performance. In this work, explore alternative thread block or CTA scheduling; in particular, we exploit the interaction between the thread block scheduler and the warp scheduler to improve performance. We explore two aspects of thread block scheduling - (1) LCS (lazy CTA scheduling) which restricts the maximum number of thread blocks allocated to each core, and (2) BCS (block CTA scheduling) where consecutive thread blocks are assigned to the same core. For LCS, we leverage a greedy warp scheduler to help determine the optimal number of thread blocks by only measuring the number of instructions issued while for BCS, we propose an alternative warp scheduler that is aware of the “block” of CTAs allocated to a core. With LCS and the observation that maximum number of CTAs does not necessary maximize performance, we also propose mixed concurrent kernel execution that enables multiple kernels to be allocated to the same core to maximize resource utilization and improve overall performance.

Keywords :

concurrency control; graphics processing units; multi-threading; resource allocation; scheduling; BCS; GPGPU resource utilization improvement; GPGPU workloads; LCS; block CTA scheduling; cooperative thread arrays; greedy warp scheduler; high-performance computing; lazy CTA scheduling; mixed concurrent kernel execution; multiple wavefronts; optimal thread blocks; parallelism maximization; performance improvement; resource utilization maximization; thread block scheduler; thread-block scheduling; Graphics processing units; Hardware; Instruction sets; Kernel; Memory management; Resource management; Scheduling;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on

Conference_Location :

Orlando, FL

Type :

conf

DOI :

10.1109/HPCA.2014.6835937

Filename :

6835937

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=157775