DocumentCode
582919
Title
High performance finite impulse response filter on graphics processors
Author
Rongxin Qu ; Chunhong Zhang ; Jinkuan Wang ; Yun Wei
Author_Institution
Sch. of Comput. Eng., Northeastern Univ., Qin Huang Dao, China
fYear
2012
fDate
15-17 July 2012
Firstpage
769
Lastpage
772
Abstract
A high performance FIR filtering algorithm on the GPU is presented based on the traditional overlapped-save method for the fast FIR filter. This algorithm exploits a symmetric segmentation approach to partition the input data into the blocks for processing. And this approach can optimize the GPU memory access and minimize the branch divergence of the warp. In addition, a zero-padding method, extending the length of the short time-domain coefficients of the FIR filter to the best size which the FFT library running on the GPU can obtain the best performance, is utilized to improve the algorithm´s performance gain for the short tap length of the FIR filter. The throughput of this algorithm can achieve over 600M samples per second throughput for the host-memory to host-memory on the NVIDIA Tesla M2090 with typical performance improvements of 4 to 6 times over Intel IPP for large chunk size.
Keywords
FIR filters; graphics processing units; FFT library; FIR filtering; GPU memory access; finite impulse response filter; graphics processors; overlapped save method; performance gain; short time-domain coefficients; symmetric segmentation; zero padding method; Algorithm design and analysis; Computer architecture; Finite impulse response filter; Graphics processing units; Instruction sets; Throughput;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Control and Information Processing (ICICIP), 2012 Third International Conference on
Conference_Location
Dalian
Print_ISBN
978-1-4577-2144-1
Type
conf
DOI
10.1109/ICICIP.2012.6391489
Filename
6391489
Link To Document