مرکز منطقه ای اطلاع رساني علوم و فناوري - Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures

DocumentCode :

3057614

Title :

Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures

Author :

Quintana-Ortí, Gregorio ; Quintana-Ortí, Enrique S. ; Chan, Ernie ; Van De Geijn, Robert A. ; Van Zee, Field G.

Author_Institution :

Univ. Jaume I, Castellon

fYear :

2008

fDate :

13-15 Feb. 2008

Firstpage :

301

Lastpage :

310

Abstract :

This paper examines the scalable parallel implementation of the QR factorization of a general matrix, targeting SMP and multi-core architectures. Two implementations of algorithms-by-blocks are presented. Each implementation views a block of a matrix as the fundamental unit of data, and likewise, operations over these blocks as the primary unit of computation. The first is a conventional blocked algorithm similar to those included in libFLAME and LAPACK but expressed in a way that allows operations in the so-called critical path of execution to be computed as soon as their dependencies are satisfied. The second algorithm captures a higher degree of parallelism with an approach based on Givens rotations while preserving the performance benefits of algorithms based on blocked Householder transformations. We show that the implementation effort is greatly simplified by expressing the algorithms in code with the FLAME/FLASH API, which allows matrices stored by blocks to be viewed and managed as matrices of matrix blocks. The SuperMatrix run-time system utilizes FLASH to assemble and represent matrices but also provides out-of-order scheduling of operations that is transparent to the programmer. Scalability of the solution is demonstrated on ccNUMA platform with 16 processors and an SMP architecture with 16 cores.

Keywords :

parallel programming; scheduling; SuperMatrix run-time system; algorithms-by-blocks; blocked Householder transformations; multi-core architectures; out-of-order scheduling; Computer architecture; Concurrent computing; Distributed computing; Fires; Linear algebra; Out of order; Parallel processing; Processor scheduling; Scalability; Scheduling algorithm; QR factorization; dynamic scheduling; high-performance; linear algebra libraries; out-of-order execution;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Parallel, Distributed and Network-Based Processing, 2008. PDP 2008. 16th Euromicro Conference on

Conference_Location :

Toulouse

ISSN :

1066-6192

Print_ISBN :

978-0-7695-3089-5

Type :

conf

DOI :

10.1109/PDP.2008.37

Filename :

4457137

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3057614