• DocumentCode
    1796530
  • Title

    Padding free bank conflict resolution for CUDA-based matrix transpose algorithm

  • Author

    Khan, Ajmal ; Al-Mouhamed, Mayez ; Fatayar, A. ; Almousa, A. ; Baqais, A. ; Assayony, M.

  • Author_Institution
    Dept. of Comput. Eng., King Fahd Univ. of Pet. & Miner., Dhahran, Saudi Arabia
  • fYear
    2014
  • fDate
    June 30 2014-July 2 2014
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Matrix Transposition is an important linear algebra procedure that has deep impact in various computational science and engineering applications. Several factors hinder the expected performance of large matrix transpose on Graphic Processing Units (GPUs). The degradation in performance involves the memory access pattern such as coalesced access in the global memory and bank conflict in the shared memory of streaming multiprocessors within the GPU. In this paper, two matrix transpose algorithms are proposed to alleviate the aforementioned issues of ensuring coalesced access and conflict free bank access. The proposed algorithms have comparable execution times with the NVIDIA SDK bank conflict - free matrix transpose implementation. The main advantage of proposed algorithms is that they eliminate bank conflicts while allocating shared memory exactly equal to the tile size (T × T) of the problem space. However, to the best of our knowledge an extra space of Tx(T +1) needs to be allocated in the published research. We have also applied the proposed transpose algorithm to recursive Gaussian implementation of NVIDIA SDK and achieved about 6% improvement in performance.
  • Keywords
    graphics processing units; mathematics computing; matrix algebra; parallel architectures; shared memory systems; storage allocation; CUDA-based matrix transpose algorithm; GPU; NVIDIA SDK bank conflict-free matrix transpose; coalesced access; computational engineering application; computational science application; conflict free bank access; graphic processing units; linear algebra procedure; matrix transposition; memory access pattern; padding free bank conflict resolution; recursive Gaussian implementation; shared memory allocation; shared streaming multiprocessor memory; Algorithm design and analysis; Graphics processing units; Indexes; Instruction sets; Kernel; Linear algebra; Writing; Bank conflict free; CUDA GPU; coalesced memory access; linear Algebra solvers; matrix transpose; solving system of linear equations;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2014 15th IEEE/ACIS International Conference on
  • Conference_Location
    Las Vegas, NV
  • Type

    conf

  • DOI
    10.1109/SNPD.2014.6888709
  • Filename
    6888709