DocumentCode
2439310
Title
Adapting communication-avoiding LU and QR factorizations to multicore architectures
Author
Donfack, Simplice ; Grigori, Laura ; Gupta, Alok Kumar
Author_Institution
INRIA Saclay-Ile de France, Univ. Paris-Sud 11, Orsay, France
fYear
2010
fDate
19-23 April 2010
Firstpage
1
Lastpage
10
Abstract
In this paper we study algorithms for performing the LU and QR factorizations of dense matrices. Recently, two communication optimal algorithms have been introduced for distributed memory architectures, referred to as communication avoiding CALU and CAQR. In this paper we discuss two algorithms based on CAQR and CALU that are adapted to multicore architectures. They combine ideas to reduce communication from communication avoiding algorithms with asynchronism and dynamic task scheduling. For matrices that are tall and skinny, that is, they have many more rows than columns, the two algorithms outperform the corresponding algorithms from Intel MKL vendor library on a dual-socket, quad-core machine based on Intel Xeon EMT64 processor and on a four-socket, quad-core machine based on AMD Opteron processor. For these matrices, multithreaded CALU outperforms the corresponding routine dgetrf from Intel MKL library up to a factor of 2.3 and the corresponding routine dgetrf from ACML library up to a factor of 5, while multithreaded CAQR outperforms by a factor of 5.3 the corresponding dgeqrf routine from MKL library.
Keywords
matrix algebra; memory architecture; multi-threading; multiprocessing programs; AMD Opteron processor; Intel MKL; Intel Xeon EMT64 processor; communication optimal algorithms; communication-avoiding LU factorizations; communication-avoiding QR factorizations; dense matrices; distributed memory architectures; multicore architectures; quad-core machine; Binary trees; Dynamic scheduling; Iterative algorithms; Libraries; Matrix decomposition; Memory architecture; Multicore processing; Processor scheduling; Scheduling algorithm; Yarn; LU and QR factorizations; communication avoiding algorithms; multicore architectures;
fLanguage
English
Publisher
ieee
Conference_Titel
Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on
Conference_Location
Atlanta, GA
ISSN
1530-2075
Print_ISBN
978-1-4244-6442-5
Type
conf
DOI
10.1109/IPDPS.2010.5470348
Filename
5470348
Link To Document