• DocumentCode
    1783245
  • Title

    Improving the Performance of CA-GMRES on Multicores with Multiple GPUs

  • Author

    Yamazaki, Ichitaro ; Anzt, Hartwig ; Tomov, Stanimire ; Hoemmen, Mark ; Dongarra, Jack

  • Author_Institution
    Univ. of Tennessee, Knoxville, TN, USA
  • fYear
    2014
  • fDate
    19-23 May 2014
  • Firstpage
    382
  • Lastpage
    391
  • Abstract
    The Generalized Minimum Residual (GMRES) method is one of the most widely-used iterative methods for solving nonsymmetric linear systems of equations. In recent years, techniques to avoid communication in GMRES have gained attention because in comparison to floating-point operations, communication is becoming increasingly expensive on modern computers. Since graphics processing units (GPUs) are now becoming crucial component in computing, we investigate the effectiveness of these techniques on multicore CPUs with multiple GPUs. While we present the detailed performance studies of a matrix powers kernel on multiple GPUs, we particularly focus on orthogonalization strategies that have a great impact on both the numerical stability and performance of GMRES, especially as the matrix becomes sparser or ill-conditioned. We present the experimental results on two eight-core Intel Sandy Bridge CPUs with three NDIVIA Fermi GPUs and demonstrate that significant speedups can be obtained by avoiding communication, either on a GPU or between the GPUs. As part of our study, we investigate several optimization techniques for the GPU kernels that can also be used in other iterative solvers besides GMRES. Hence, our studies not only emphasize the importance of avoiding communication on GPUs, but they also provide insight about the effects of these optimization techniques on the performance of the sparse solvers, and may have greater impact beyond GMRES.
  • Keywords
    graphics processing units; iterative methods; multiprocessing systems; optimisation; CA-GMRES; CPU; NDIVIA Fermi GPU; eight-core Intel Sandy Bridge CPU; generalized minimum residual method; graphics processing units; iterative methods; iterative solvers; matrix powers kernel; multicores; multiple GPU; nonsymmetric linear equation systems; numerical stability; optimization techniques; orthogonalization strategies; Graphics processing units; Indexes; Kernel; Linear systems; Multicore processing; Optimization; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
  • Conference_Location
    Phoenix, AZ
  • ISSN
    1530-2075
  • Print_ISBN
    978-1-4799-3799-8
  • Type

    conf

  • DOI
    10.1109/IPDPS.2014.48
  • Filename
    6877272