• DocumentCode
    2998766
  • Title

    Simple Memory Machine Models for GPUs

  • Author

    Nakano, Koji

  • Author_Institution
    Dept. of Inf. Eng., Hiroshima Univ., Higashi-Hiroshima, Japan
  • fYear
    2012
  • fDate
    21-25 May 2012
  • Firstpage
    794
  • Lastpage
    803
  • Abstract
    The main contribution of this paper is to introduce two parallel memory machines, the Discrete Memory Machine (DMM) and the Unified Memory Machine (UMM). Unlike well studied theoretical parallel computational models such as PRAMs, these parallel memory machines are practical and capture the essential feature of memory access of NVIDIA GPUs. As a first step of the development of algorithmic techniques on the DMM and the UMM, we first evaluated the computing time for the contiguous access and the stride access to the memory on these models. We then go on to present parallel algorithms to transpose a two dimensional array on these models. Finally, we show that, for any given permutation, data in an array can be moved along a given permutation both on the DMM and on the UMM. Since the computing time of our permutation algorithms on the DMM and the UMM is equal to the sum of the lower bounds obtained from the memory bandwidth limitation and the latency overhead, they are optimal from the theoretical point of view.
  • Keywords
    graphics processing units; parallel machines; parallel memories; random-access storage; 2D array; NVIDIA GPU; PRAM; computing time; contiguous access; discrete memory machine; latency overhead; memory access; memory bandwidth limitation; parallel algorithm; parallel computational model; parallel memory machines; permutation algorithm; simple memory machine model; unified memory machine; Arrays; Bandwidth; Computational modeling; Graphics processing unit; Parallel processing; Phase change random access memory; CUDA; GPU; array permutation; matrix transpose; memory banks; parallel algorithms; parallel computing models; stride memory access;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2012 IEEE 26th International
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-0974-5
  • Type

    conf

  • DOI
    10.1109/IPDPSW.2012.98
  • Filename
    6270721