مرکز منطقه ای اطلاع رساني علوم و فناوري - Tile QR factorization with parallel panel processing for multicore architectures

DocumentCode :

2441340

Title :

Tile QR factorization with parallel panel processing for multicore architectures

Author :

Hadri, Bilel ; Ltaief, Hatem ; Agullo, Emmanuel ; Dongarra, Jack

Author_Institution :

Dept. of Electr. Eng. & Comput. Sci., Univ. of Tennessee, Knoxville, TN, USA

fYear :

2010

fDate :

19-23 April 2010

Firstpage :

Lastpage :

Abstract :

To exploit the potential of multicore architectures, recent dense linear algebra libraries have used tile algorithms, which consist in scheduling a Directed Acyclic Graph (DAG) of tasks of fine granularity where nodes represent tasks, either panel factorization or update of a block-column, and edges represent dependencies among them. Although past approaches already achieve high performance on moderate and large square matrices, their way of processing a panel in sequence leads to limited performance when factorizing tall and skinny matrices or small square matrices. We present a new fully asynchronous method for computing a QR factorization on shared-memory multicore architectures that overcomes this bottleneck. Our contribution is to adapt an existing algorithm that performs a panel factorization in parallel (named Communication-A voiding QR and initially designed for distributed-memory machines), to the context of tile algorithms using asynchronous computations. An experimental study shows significant improvement (up to almost 10 times faster) compared to state-of-the-art approaches. We aim to eventually incorporate this work into the Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA) library.

Keywords :

directed graphs; distributed memory systems; linear algebra; matrix decomposition; parallel architectures; processor scheduling; shared memory systems; PLASMA library; asynchronous computations; asynchronous method; communication-avoiding QR; dense linear algebra library; directed acyclic graph; distributed-memory machines; panel factorization; parallel linear algebra; parallel panel processing; scalable multicore architectures; scheduling; shared-memory multicore architectures; skinny matrices; small square matrices; tile QR factorization; tile algorithms; Algorithm design and analysis; Computer architecture; Concurrent computing; Context; Distributed computing; Libraries; Linear algebra; Multicore processing; Scheduling algorithm; Tiles; Communication Avoiding; Dynamic scheduling; Multicore; QR factorization; Tile Algorithms;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location :

Atlanta, GA

ISSN :

1530-2075

Print_ISBN :

978-1-4244-6442-5

Type :

conf

DOI :

10.1109/IPDPS.2010.5470443

Filename :

5470443

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2441340