DocumentCode
2306083
Title
Implementation of parallel sparse Cholesky factorization on GPU
Author
Dan Zou ; Yong Dou
Author_Institution
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear
2012
fDate
29-31 Dec. 2012
Firstpage
2228
Lastpage
2232
Abstract
Direct methods for solving large sparse symmetric positive-definite linear systems of equations are popular because of their generality and robustness. The main bottleneck is the sparse Cholesky factorization, which exhibits irregular memory access behavior and unbalanced workload. In the past 10 years, many sparse Cholesky factorization algorithms have emerged, exploiting new architectural features. However, programming techniques currently employed on these platforms are not sufficient to implement sparse Cholesky factorization on many-core graphics processing units (GPUs) due to mismatches between irregular problem structures and single-instruction multiple-thread GPU architectures. In the present paper, we propose a task-based software approach for the parallel sparse Cholesky factorization aimed at heterogeneous computing platforms with GPU accelerators. The tasks are generated by CPU. An efficient task-scheduling mechanism guarantees the correct ordering of task execution and ensures a load balanced execution on GPU. Comparisons are made with the existing solver using problems arising from a range of practical applications. The experiment results show that the proposed approach can substantially improve the performance of sparse Cholesky factorization on GPU with 2.7×-4× speedup.
Keywords
graphics processing units; linear systems; mathematics computing; matrix decomposition; parallel programming; scheduling; task analysis; GPU accelerators; architectural features; heterogeneous computing platforms; irregular memory access behavior; irregular problem structures; large sparse symmetric positive-definite linear systems; load balanced execution; many-core graphics processing units; parallel sparse Cholesky factorization algorithm; programming techniques; single-instruction multiple-thread GPU architectures; task execution ordering; task-based software approach; task-scheduling mechanism; unbalanced workload; GPU; sparse Cholesky factorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Network Technology (ICCSNT), 2012 2nd International Conference on
Conference_Location
Changchun
Print_ISBN
978-1-4673-2963-7
Type
conf
DOI
10.1109/ICCSNT.2012.6526361
Filename
6526361
Link To Document