• DocumentCode
    576809
  • Title

    Tuning Block Size for QR Factorization on CPU-GPU Hybrid Systems

  • Author

    Tsai, Yaohung M. ; Wang, Weichung ; Chen, Ray-Bing

  • Author_Institution
    Dept. of Math., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2012
  • fDate
    20-22 Sept. 2012
  • Firstpage
    205
  • Lastpage
    211
  • Abstract
    In CPU-GPU hybrid systems, the QR factorization in MAGMA results in CPU idle due to the fixed block size. To improve the computational efficiency of MAGMA QR factorization, we propose a variable block size auto-tuning scheme on CPU-GPU hybrid systems. First, we fit the CPU and GPU costs in MAGMA QR factorization via two independent regression models as CPU and GPU performance models. Next, we propose a block size optimization scheme to tune the block size adaptively and therefore to minimize a cost objective function. The cost objective function is designed to balance the workloads between CPU and GPU based on the performance models. Finally, several numerical results demonstrate the performance gains due to the novel QR factorization algorithm.
  • Keywords
    graphics processing units; matrix decomposition; multiprocessing systems; CPU costs; CPU performance models; CPU-GPU hybrid systems; GPU costs; GPU performance models; MAGMA QR factorization; block size optimization scheme; computational efficiency improvement; fixed block size; independent regression models; matrix algebra-on-GPU-and-multicore architectures; tuning block size; variable block size auto-tuning scheme; workload balancing; Algorithms; Central Processing Unit; Computer architecture; Graphics processing units; Matrix decomposition; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Embedded Multicore Socs (MCSoC), 2012 IEEE 6th International Symposium on
  • Conference_Location
    Aizu-Wakamatsu
  • Print_ISBN
    978-1-4673-2535-6
  • Electronic_ISBN
    978-0-7695-4800-5
  • Type

    conf

  • DOI
    10.1109/MCSoC.2012.32
  • Filename
    6354700