DocumentCode :
2441340
Title :
Tile QR factorization with parallel panel processing for multicore architectures
Author :
Hadri, Bilel ; Ltaief, Hatem ; Agullo, Emmanuel ; Dongarra, Jack
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA
fYear :
2010
fDate :
19-23 April 2010
Firstpage :
1
Lastpage :
10
Abstract :
To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-A voiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.
Keywords :
directed graphs; distributed memory systems; linear algebra; matrix decomposition; parallel architectures; processor scheduling; shared memory systems; PLASMA library; asynchronous computations; asynchronous method; communication-avoiding QR; dense linear algebra library; directed acyclic graph; distributed-memory machines; panel factorization; parallel linear algebra; parallel panel processing; scalable multicore architectures; scheduling; shared-memory multicore architectures; skinny matrices; small square matrices; tile QR factorization; tile algorithms; Algorithm design and analysis; Computer architecture; Concurrent computing; Context; Distributed computing; Libraries; Linear algebra; Multicore processing; Scheduling algorithm; Tiles; Communication Avoiding; Dynamic scheduling; Multicore; QR factorization; Tile Algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location :
Atlanta, GA
ISSN :
1530-2075
Print_ISBN :
978-1-4244-6442-5
Type :
conf
DOI :
10.1109/IPDPS.2010.5470443
Filename :
5470443
Link To Document :
بازگشت