DocumentCode :
2000677
Title :
Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics
Author :
Pedram, Ardavan
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2013
fDate :
20-24 May 2013
Firstpage :
2214
Lastpage :
2217
Abstract :
We show the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain linear algebra operations. The broad vision of this project is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application through standard linear algebra packages. We have designed a specialized linear algebra processor (LAP) that can perform level-3 BLAS and more complex LAPACK level operations like Cholesky, LU (with partial pivoting), and QR factorizations. We present a power performance model that compares state of the art CPUs and GPUs with our design. Our power model reveals sources of inefficiencies in CPUs and GPUs, and our LAP design demonstrates how to overcome them. When compared to other conventional architectures for linear algebra applications, LAP is over orders of magnitude more power efficient. Based on our estimations up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.
Keywords :
graphics processing units; hardware-software codesign; linear algebra; logic design; mathematics computing; microprocessor chips; performance evaluation; power aware computing; CPU; GPU; LU factorizations; QR factorizations; algorithm-architecture codesign; coarse-grain linear algebra operations; complex LAPACK level operations; double-precision efficiency; hardware-software solution codesign; hardware-software solution cooptimization; level-3 BLAS; linear algebra processor; low power high performance linear algebra compute fabrics; power performance model; single-precision efficiency; size 45 nm; standard linear algebra packages; Algorithm design and analysis; Bandwidth; Computer architecture; Hardware; Kernel; Linear algebra; Accelerator; Linear Algebra; Power efficient;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
Type :
conf
DOI :
10.1109/IPDPSW.2013.166
Filename :
6651133
Link To Document :
بازگشت