Algorithm/Architecture Codesign of Low Power and High Performance Linear Algebra Compute Fabrics

Author

Pedram, Ardavan

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA

fYear

2013

fDate

20-24 May 2013

Firstpage

2214

Lastpage

2217

Abstract

We show the design of specialized compute fabrics that maintain the efficiency of full custom hardware while providing enough flexibility to execute a whole class of coarse-grain linear algebra operations. The broad vision of this project is to develop integrated and specialized hardware/software solutions that are co-optimized and co-designed across all layers ranging from the basic hardware foundations all the way to the application through standard linear algebra packages. We have designed a specialized linear algebra processor (LAP) that can perform level-3 BLAS and more complex LAPACK level operations like Cholesky, LU (with partial pivoting), and QR factorizations. We present a power performance model that compares state of the art CPUs and GPUs with our design. Our power model reveals sources of inefficiencies in CPUs and GPUs, and our LAP design demonstrates how to overcome them. When compared to other conventional architectures for linear algebra applications, LAP is over orders of magnitude more power efficient. Based on our estimations up to 55 and 25 GFLOPS/W single- and double-precision efficiencies are achievable on a single chip in standard 45nm technology.

Keywords

graphics processing units; hardware-software codesign; linear algebra; logic design; mathematics computing; microprocessor chips; performance evaluation; power aware computing; CPU; GPU; LU factorizations; QR factorizations; algorithm-architecture codesign; coarse-grain linear algebra operations; complex LAPACK level operations; double-precision efficiency; hardware-software solution codesign; hardware-software solution cooptimization; level-3 BLAS; linear algebra processor; low power high performance linear algebra compute fabrics; power performance model; single-precision efficiency; size 45 nm; standard linear algebra packages; Algorithm design and analysis; Bandwidth; Computer architecture; Hardware; Kernel; Linear algebra; Accelerator; Linear Algebra; Power efficient;

fLanguage

English

Publisher

ieee

Conference_Titel

Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International

Conference_Location

Cambridge, MA

Print_ISBN

978-0-7695-4979-8

Type

conf

DOI

10.1109/IPDPSW.2013.166

Filename

6651133