DocumentCode
2958675
Title
New Scheduling Strategies and Hybrid Programming for a Parallel Right-looking Sparse LU Factorization Algorithm on Multicore Cluster Systems
Author
Yamazaki, Ichitaro ; Li, Xiaoye S.
Author_Institution
Comput. Res. Div., Lawrence Berkeley Nat. Lab., Berkeley, CA, USA
fYear
2012
fDate
21-25 May 2012
Firstpage
619
Lastpage
630
Abstract
Parallel sparse LU factorization is a key computational kernel in the solution of a large-scale linear system of equations. In this paper, we propose two strategies to address some scalability issues of a factorization algorithm on modern HPC systems. The first strategy is at the algorithmic-level, we schedule independent tasks as soon as possible to reduce the idle time and the critical path of the algorithm. We demonstrate using thousands of cores that our new scheduling strategy reduces the runtime by nearly three-fold from that of a state-of-the-art pipelined factorization algorithm. The second strategy is at both programming- and architecture-levels, we incorporate light-weight Open MP threads in each MPI process to reduce both memory and time overheads of a pure MPI implementation on many core NUMA architectures. Using this hybrid programming paradigm, we obtain a significant reduction in memory usage while achieving a parallel efficiency competitive with that of a pure MPI paradigm. As a result, in comparison to a pure MPI paradigm which failed due to the per-core memory constraint, the hybrid paradigm could utilize more cores on each node and reduce the factorization time on the same number of nodes. We show extensive performance analysis of the new strategies using thousands of cores of the two leading HPC systems, a Cray-XE6 and an IBM iDataPlex.
Keywords
application program interfaces; mathematics computing; matrix decomposition; message passing; multiprocessing systems; parallel architectures; scheduling; Cray-XE6; HPC system; IBM iDataPlex; MPI process; algorithmic-level; architecture-level; computational kernel; factorization time reduction; hybrid programming paradigm; independent task scheduling; large-scale linear system; light-weight Open MP thread; many core NUMA architecture; memory overhead reduction; memory usage reduction; multicore cluster system; parallel efficiency; parallel right-looking sparse LU factorization algorithm; per-core memory constraint; pipelined factorization algorithm; programming-level; runtime reduction; scalability issue; time overhead reduction; Linear systems; Memory management; Multicore processing; Processor scheduling; Programming; Scheduling;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th International
Conference_Location
Shanghai
ISSN
1530-2075
Print_ISBN
978-1-4673-0975-2
Type
conf
DOI
10.1109/IPDPS.2012.63
Filename
6267864
Link To Document