Title :
Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
We show the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain linear algebra operations. The broad vision of this project is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application through standard linear algebra packages. We have designed a specialized linear algebra processor (LAP) that can perform level-3 BLAS and more complex LAPACK level operations like Cholesky, LU (with partial pivoting), and QR factorizations. We present a power performance model that compares state of the art CPUs and GPUs with our design. Our power model reveals sources of inefficiencies in CPUs and GPUs, and our LAP design demonstrates how to overcome them. When compared to other conventional architectures for linear algebra applications, LAP is over orders of magnitude more power efficient. Based on our estimations up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.
Keywords :
graphics processing units; hardware-software codesign; linear algebra; logic design; mathematics computing; microprocessor chips; performance evaluation; power aware computing; CPU; GPU; LU factorizations; QR factorizations; algorithm-architecture codesign; coarse-grain linear algebra operations; complex LAPACK level operations; double-precision efficiency; hardware-software solution codesign; hardware-software solution cooptimization; level-3 BLAS; linear algebra processor; low power high performance linear algebra compute fabrics; power performance model; single-precision efficiency; size 45 nm; standard linear algebra packages; Algorithm design and analysis; Bandwidth; Computer architecture; Hardware; Kernel; Linear algebra; Accelerator; Linear Algebra; Power efficient;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
DOI :
10.1109/IPDPSW.2013.166