• DocumentCode
    2026457
  • Title

    Acceleration of dRMSD calculation and efficient usage of GPU caches

  • Author

    Filipovic, Jiri ; Plhak, Jan ; Strelak, David

  • Author_Institution
    Fac. of Inf., Masaryk Univ., Brno, Czech Republic
  • fYear
    2015
  • fDate
    20-24 July 2015
  • Firstpage
    47
  • Lastpage
    54
  • Abstract
    In this paper, we introduce the GPU acceleration of dRMSD algorithm, used to compare different structures of a molecule. Comparing to multithreaded CPU implementation, we have reached 13.4× speedup in clustering and 62.7× speedup in I:I dRMSD computation using mid-end GPU. The dRMSD computation exposes strong memory locality and thus is compute-bound. Along with conservative implementation using shared memory, we have decided to implement variants of the algorithm using GPU caches to maintain memory locality. Our implementation using cache reaches 96.5% and 91.6% of shared memory performance on Fermi and Maxwell, respectively. We have identified several performance pitfalls related to cache blocking in compute-bound codes and suggested optimization techniques to improve the performance.
  • Keywords
    cache storage; graphics processing units; multi-threading; optimisation; pattern clustering; shared memory systems; Fermi; GPU acceleration; GPU caches; Maxwell; cache blocking; clustering speedup; compute-bound codes; dRMSD calculation; memory locality; mid-end GPU; multithreaded CPU implementation; optimization techniques; shared memory; shared memory performance; Bandwidth; Computer architecture; Graphics processing units; Instruction sets; Kernel; Optimization; Registers; GPU; RMSD; cache; code optimization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    High Performance Computing & Simulation (HPCS), 2015 International Conference on
  • Conference_Location
    Amsterdam
  • Print_ISBN
    978-1-4673-7812-3
  • Type

    conf

  • DOI
    10.1109/HPCSim.2015.7237020
  • Filename
    7237020