DocumentCode :
1958471
Title :
High performance and memory efficient implementation of matrix multiplication on FPGAs
Author :
Wu, Guiming ; Dou, Yong ; Wang, Miao
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
fYear :
2010
fDate :
8-10 Dec. 2010
Firstpage :
134
Lastpage :
137
Abstract :
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S2), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
Keywords :
circuit optimisation; field programmable gate arrays; matrix multiplication; FPGA devices; clock frequency; matrix multiplication; memory optimized block algorithm; processing elements; serial algorithm; Algorithm design and analysis; Arrays; Field programmable gate arrays; Hardware; Memory management; Optimization; Random access memory;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Field-Programmable Technology (FPT), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8980-0
Type :
conf
DOI :
10.1109/FPT.2010.5681769
Filename :
5681769
Link To Document :
بازگشت