مرکز منطقه ای اطلاع رساني علوم و فناوري - To GPU synchronize or not GPU synchronize?

DocumentCode :

3385620

Title :

To GPU synchronize or not GPU synchronize?

Author :

Feng, Wu-chun ; Xiao, Shucai

Author_Institution :

Dept. of Comput. Sci., Virginia Tech, Blacksburg, VA, USA

fYear :

2010

fDate :

May 30 2010-June 2 2010

Firstpage :

3801

Lastpage :

3804

Abstract :

The graphics processing unit (GPU) has evolved from being a fixed-function processor with programmable stages into a programmable processor with many fixed-function components that deliver massive parallelism. By modifying the GPU´s stream processor to support “general-purpose computation” on the GPU (GPGPU), applications that perform massive vector operations can realize many orders-of-magnitude improvement in performance over a traditional processor, i.e., CPU. However, the breadth of general-purpose computation that can be efficiently supported on a GPU has largely been limited to highly dataparallel or task-parallel applications due to the lack of explicit support for communication between streaming multiprocessors (SMs) on the GPU. Such communication can occur via the global memory of a GPU, but it then requires a barrier synchronization across the SMs of the GPU in order to complete the communication between SMs. Although our previous work demonstrated that implementing barrier synchronization on the GPU itself can significantly improve performance and deliver correct results in critical bioinformatics applications, guaranteeing the correctness of inter-SM communication is only possible if a memory consistency model is assumed. To address this problem, NVIDIA recently introduced the _threadfence() function in CUDA 2.2, a function that can guarantee the correctness of GPU-based inter-SM communication. However, this function currently introduces so much overhead that when using it in (direct) GPU synchronization, GPU synchronization actually performs worse than indirect synchronization via the CPU, thus raising the question of whether “to GPU synchronize or not GPU synchronize?”

Keywords :

computer graphic equipment; coprocessors; parallel architectures; synchronisation; vector processor systems; CUDA; GPU; NVIDIA; barrier synchronization; fixed function processor; general purpose computation; graphic processing unit; programmable processor; streaming multiprocessor; task parallel application; vector operation; Application software; Bioinformatics; Central Processing Unit; Computer architecture; Computer science; Graphical processing unit; Graphics processing unit; Interpolation; Parallel processing; Rendering (computer graphics);

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium on

Conference_Location :

Paris

Print_ISBN :

978-1-4244-5308-5

Electronic_ISBN :

978-1-4244-5309-2

Type :

conf

DOI :

10.1109/ISCAS.2010.5537722

Filename :

5537722

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3385620