Title :
Performance modeling in CUDA streams — A means for high-throughput data processing
Author :
Hao Li ; Di Yu ; Kumar, Ajit ; Yi-Cheng Tu
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of South Florida, Tampa, FL, USA
Abstract :
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA´s CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named CUDA stream. Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
Keywords :
parallel architectures; performance evaluation; query processing; relational databases; resource allocation; CUDA stream mechanism; G-SDMS; NVIDIA CUDA framework; compute-bound kernels; computing power; concurrent processing; concurrent query operators; data streams; heterogenous query processing operations; high-throughput data processing software; main kernel scheduling disciplines; performance modeling; push-based DBMS; push-based database management system; query engine; real-world CUDA kernels; resource allocation; resource consumption; resource occupancy; runtime mechanism; synthetic CUDA kernels; system implementation platform; Computational modeling; Graphics processing units; Instruction sets; Kernel; Processor scheduling; Registers; Runtime; CUDA; CUDA stream; DBMS; GPGPU; GPU; push-based systems;
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/BigData.2014.7004245