DocumentCode
2026457
Title
Acceleration of dRMSD calculation and efficient usage of GPU caches
Author
Filipovic, Jiri ; Plhak, Jan ; Strelak, David
Author_Institution
Fac. of Inf., Masaryk Univ., Brno, Czech Republic
fYear
2015
fDate
20-24 July 2015
Firstpage
47
Lastpage
54
Abstract
In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4× speedup in clustering and 62.7× speedup in I:I dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5% and 91.6% of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
Keywords
cache storage; graphics processing units; multi-threading; optimisation; pattern clustering; shared memory systems; Fermi; GPU acceleration; GPU caches; Maxwell; cache blocking; clustering speedup; compute-bound codes; dRMSD calculation; memory locality; mid-end GPU; multithreaded CPU implementation; optimization techniques; shared memory; shared memory performance; Bandwidth; Computer architecture; Graphics processing units; Instruction sets; Kernel; Optimization; Registers; GPU; RMSD; cache; code optimization;
fLanguage
English
Publisher
ieee
Conference_Titel
High Performance Computing & Simulation (HPCS), 2015 International Conference on
Conference_Location
Amsterdam
Print_ISBN
978-1-4673-7812-3
Type
conf
DOI
10.1109/HPCSim.2015.7237020
Filename
7237020
Link To Document