DocumentCode
1958471
Title
High performance and memory efficient implementation of matrix multiplication on FPGAs
Author
Wu, Guiming ; Dou, Yong ; Wang, Miao
Author_Institution
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear
2010
fDate
8-10 Dec. 2010
Firstpage
134
Lastpage
137
Abstract
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S2), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
Keywords
circuit optimisation; field programmable gate arrays; matrix multiplication; FPGA devices; clock frequency; matrix multiplication; memory optimized block algorithm; processing elements; serial algorithm; Algorithm design and analysis; Arrays; Field programmable gate arrays; Hardware; Memory management; Optimization; Random access memory;
fLanguage
English
Publisher
ieee
Conference_Titel
Field-Programmable Technology (FPT), 2010 International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-8980-0
Type
conf
DOI
10.1109/FPT.2010.5681769
Filename
5681769
Link To Document