• DocumentCode
    2536754
  • Title

    Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

  • Author

    Di, Peng ; Wan, Qing ; Zhang, Xuemeng ; Wu, Hui ; Xue, Jingling

  • Author_Institution
    Programming Languages & Compilers Group, UNSW, Sydney, NSW, Australia
  • fYear
    2010
  • fDate
    13-16 Sept. 2010
  • Firstpage
    40
  • Lastpage
    50
  • Abstract
    To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich in DOACR parallelism to identify optimization principles and strategies that allow their efficient mapping to GPGPUs. Our main finding is that certain DOACR loops can be accelerated further on GPGPUs if they are algorithmically restructured (by a domain expert) to be more amendable to GPGPU parallelization, judiciously optimized (by the compiler), and carefully tuned by a performance-tuning tool. We substantiate this finding with a case study by presenting a new parallel SSOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SSOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a generalized loop tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of applications, particularly PDE-based DOACR loops, on GPGPUs.
  • Keywords
    iterative methods; multiprocessing systems; optimisation; parallel processing; partial differential equations; DOACROSS parallelism; K-layer SSOR method; PDE based DOACR loops; cross iteration data; data parallel SIMD execution; domain decomposition technique; general purpose computing; generalized loop tiling; graphics processing unit; iterative PDE solvers; multiGPGPU; nondependence preserving scheme; optimization principles; parallel SSOR method; partial differential equations; red-black SOR; Convergence; Graphics processing unit; Instruction sets; Kernel; Optimization; Parallel processing; Tiles; DOACR Parallelism; GPGPU; Loop Tiling; SOR;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2010 39th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    0190-3918
  • Print_ISBN
    978-1-4244-7913-9
  • Electronic_ISBN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2010.13
  • Filename
    5599223