DocumentCode :
3396787
Title :
Evaluation of global synchronization for iterative algebra algorithms on many-core
Author :
ul Hasan Khan, Ayaz ; Al-Mouhamed, Mayez ; Firdaus, Lutfi A.
Author_Institution :
Dept. of Comput. Eng., KFUPM, Dhahran, Saudi Arabia
fYear :
2015
fDate :
1-3 June 2015
Firstpage :
1
Lastpage :
6
Abstract :
Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques: 1) Relaxed Synchronization, and 2) Block-Query Synchronization. These schemes are used in implementing numerical iterative solvers where computation/communication overlapping is one used optimization to enhance application performance. We have evaluated and analyzed the performance of the proposed synchronization techniques using Jacobi Iterative Solver in comparison to the state of the art inter-block lock-free synchronization techniques. We have achieved about 1-8% performance improvement in terms of execution time over lock-free synchronization depending on the problem size and the number of thread blocks. We have also evaluated the proposed algorithm on GPU and MIC architectures and obtained about 8-26% performance improvement over the barrier synchronization available in OpenMP programming environment depending on the problem size and number of cores used.
Keywords :
algebra; application program interfaces; graphics processing units; iterative methods; multiprocessing systems; parallel architectures; synchronisation; GPU; Jacobi iterative solver; MIC architectures; OpenMP programming; barrier synchronization; block-query synchronization; global synchronization; interblock synchronization techniques; iterative algebra algorithms; lock-free synchronization; relaxed synchronization; Computer architecture; Graphics processing units; Instruction sets; Jacobian matrices; Kernel; Microwave integrated circuits; Synchronization; CUDA; GPU; Inter-Block Synchronization; Jacobi Iterative Method Graphics Processing Unit (GPU); OpenMP; Xeon Phi;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2015 16th IEEE/ACIS International Conference on
Conference_Location :
Takamatsu
Type :
conf
DOI :
10.1109/SNPD.2015.7176173
Filename :
7176173
Link To Document :
بازگشت