Title :
High performance and memory efficient implementation of matrix multiplication on FPGAs
Author :
Wu, Guiming ; Dou, Yong ; Wang, Miao
Author_Institution :
Nat. Lab. for Parallel & Distrib. Process., Nat. Univ. of Defense Technol., Changsha, China
Abstract :
We present a high performance and memory efficient hardware implementation of matrix multiplication for dense matrices of any size on the FPGA devices. By applying a series of transformations and optimizations on the original serial algorithm, we can obtain an I/O and memory optimized block algorithm for matrix multiplication on FPGAs. A linear array of processing elements (PEs) is proposed to implement this block algorithm. We show significant reduction in hardware resources consuming compared to the related work while increasing clock frequency. Moreover, the memory requirement can be reduced to O(S) from O(S2), where S is the block size. Therefore, more PEs can be integrated into the same FPGA devices.
Keywords :
circuit optimisation; field programmable gate arrays; matrix multiplication; FPGA devices; clock frequency; matrix multiplication; memory optimized block algorithm; processing elements; serial algorithm; Algorithm design and analysis; Arrays; Field programmable gate arrays; Hardware; Memory management; Optimization; Random access memory;
Conference_Titel :
Field-Programmable Technology (FPT), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8980-0
DOI :
10.1109/FPT.2010.5681769