Adapting communication-avoiding LU and QR factorizations to multicore architectures

Author

Donfack, Simplice ; Grigori, Laura ; Gupta, Alok Kumar

Author_Institution

INRIA Saclay-Ile de France, Univ. Paris-Sud 11, Orsay, France

fYear

2010

fDate

19-23 April 2010

Firstpage

1

Lastpage

10

Abstract

In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.

Keywords

matrix algebra; memory architecture; multi-threading; multiprocessing programs; AMD Opteron processor; Intel MKL; Intel Xeon EMT64 processor; communication optimal algorithms; communication-avoiding LU factorizations; communication-avoiding QR factorizations; dense matrices; distributed memory architectures; multicore architectures; quad-core machine; Binary trees; Dynamic scheduling; Iterative algorithms; Libraries; Matrix decomposition; Memory architecture; Multicore processing; Processor scheduling; Scheduling algorithm; Yarn; LU and QR factorizations; communication avoiding algorithms; multicore architectures;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on

Conference_Location

Atlanta, GA

ISSN

1530-2075

Print_ISBN

978-1-4244-6442-5

Type

conf

DOI

10.1109/IPDPS.2010.5470348

Filename

5470348