• DocumentCode
    2170806
  • Title

    Weak execution ordering - exploiting iterative methods on many-core GPUs

  • Author

    Chen, Jianmin ; Huang, Zhuo ; Su, Feiqi ; Peir, Jih-Kwon ; Ho, Jeff ; Peng, Lu

  • Author_Institution
    Dept. of Comput. & Inf. Sci. & Eng., Univ. of Florida, Gainesville, FL, USA
  • fYear
    2010
  • fDate
    28-30 March 2010
  • Firstpage
    154
  • Lastpage
    163
  • Abstract
    On NVIDIA´s many-core GPUs, there is no synchronization function among parallel thread blocks. When fine-granularity of data communication and synchronization is required for large-scale parallel programs executed by multiple thread blocks, frequent host synchronization are necessary, and they incur a significant overhead. In this paper, we investigate a class of applications which uses a chaotic version of iterative methods [5], [22] to obtain numerical solutions for partial differential equations (PDE). Such a fast PDE solver is parallelized on GPUs with multiple thread blocks. In this parallel implementation, although frequent data communication is needed between adjacent thread blocks, a precise order of the data communication is not necessary. Separate communication threads are used for periodically exchanging the boundary values with adjacent thread blocks through the global memory. Since a precise order of the data communication is not required, the computation and the communication threads can be overlapped to alleviate the communication overhead. Performance measurements of two popular applications, Poisson image editing from computer graphics and shape from shading from computer vision, on Tesla C1060 show that a speedup of 4-5 times is achievable for both applications in comparison with the solution using host synchronization.
  • Keywords
    computer graphics; coprocessors; data communication equipment; iterative methods; partial differential equations; Poisson image editing; Tesla C1060; computer graphics; computer vision; data communication; host synchronization; iterative methods; many-core GPU; parallel thread blocks; partial differential equations; shape from shading; weak execution ordering; Application software; Chaotic communication; Computer graphics; Computer vision; Data communication; Iterative methods; Large-scale systems; Partial differential equations; Shape measurement; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on
  • Conference_Location
    White Plains, NY
  • Print_ISBN
    978-1-4244-6023-6
  • Electronic_ISBN
    978-1-4244-6024-3
  • Type

    conf

  • DOI
    10.1109/ISPASS.2010.5452028
  • Filename
    5452028