Title :
Reconstructing Householder Vectors from Tall-Skinny QR
Author :
Ballard, Grey ; Demmel, J. ; Grigori, Laura ; Jacquelin, Mathias ; Hong Diep Nguyen ; Solomonik, Edgar
Author_Institution :
Sandia Nat. Labs., Livermore, CA, USA
Abstract :
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstrate the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. As a result, our final parallel QR algorithm outperforms ScaLAPACK and Elemental implementations of Householder QR and our implementation of CAQR on the Hopper Cray XE6 NERSC system. We also provide algorithmic improvements to the ScaLAPACK and CAQR algorithms.
Keywords :
mathematics computing; matrix decomposition; numerical stability; parallel algorithms; software engineering; vectors; CAQR algorithm; Hopper Cray XE6 NERSC system; QR matrix decomposition; TSQR algorithm; asymptotic communication efficiency; communication-avoiding QR algorithm; householder algorithm; householder vector reconstruction; householder vector representation; latency costs; numerical stability; orthogonal factor representation; parallel QR algorithms; software development; square matrix factoring; tall-skinny QR algorithm; trailing matrix; Algorithm design and analysis; Bandwidth; Computational efficiency; Matrix decomposition; Program processors; Standards; Vectors; Householder QR; TSQR; communication-avoiding;
Conference_Titel :
Parallel and Distributed Processing Symposium, 2014 IEEE 28th International
Conference_Location :
Phoenix, AZ
Print_ISBN :
978-1-4799-3799-8
DOI :
10.1109/IPDPS.2014.120