• DocumentCode
    2306083
  • Title

    Implementation of parallel sparse Cholesky factorization on GPU

  • Author

    Dan Zou ; Yong Dou

  • Author_Institution
    Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
  • fYear
    2012
  • fDate
    29-31 Dec. 2012
  • Firstpage
    2228
  • Lastpage
    2232
  • Abstract
    Direct methods for solving large sparse symmetric positive-definite linear systems of equations are popular because of their generality and robustness. The main bottleneck is the sparse Cholesky factorization, which exhibits irregular memory access behavior and unbalanced workload. In the past 10 years, many sparse Cholesky factorization algorithms have emerged, exploiting new architectural features. However, programming techniques currently employed on these platforms are not sufficient to implement sparse Cholesky factorization on many-core graphics processing units (GPUs) due to mismatches between irregular problem structures and single-instruction multiple-thread GPU architectures. In the present paper, we propose a task-based software approach for the parallel sparse Cholesky factorization aimed at heterogeneous computing platforms with GPU accelerators. The tasks are generated by CPU. An efficient task-scheduling mechanism guarantees the correct ordering of task execution and ensures a load balanced execution on GPU. Comparisons are made with the existing solver using problems arising from a range of practical applications. The experiment results show that the proposed approach can substantially improve the performance of sparse Cholesky factorization on GPU with 2.7×-4× speedup.
  • Keywords
    graphics processing units; linear systems; mathematics computing; matrix decomposition; parallel programming; scheduling; task analysis; GPU accelerators; architectural features; heterogeneous computing platforms; irregular memory access behavior; irregular problem structures; large sparse symmetric positive-definite linear systems; load balanced execution; many-core graphics processing units; parallel sparse Cholesky factorization algorithm; programming techniques; single-instruction multiple-thread GPU architectures; task execution ordering; task-based software approach; task-scheduling mechanism; unbalanced workload; GPU; sparse Cholesky factorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
  • Conference_Location
    Changchun
  • Print_ISBN
    978-1-4673-2963-7
  • Type

    conf

  • DOI
    10.1109/ICCSNT.2012.6526361
  • Filename
    6526361