• DocumentCode
    582919
  • Title

    High performance finite impulse response filter on graphics processors

  • Author

    Rongxin Qu ; Chunhong Zhang ; Jinkuan Wang ; Yun Wei

  • Author_Institution
    Sch. of Comput. Eng., Northeastern Univ., Qin Huang Dao, China
  • fYear
    2012
  • fDate
    15-17 July 2012
  • Firstpage
    769
  • Lastpage
    772
  • Abstract
    A high performance FIR filtering algorithm on the GPU is presented based on the traditional overlapped-save method for the fast FIR filter. This algorithm exploits a symmetric segmentation approach to partition the input data into the blocks for processing. And this approach can optimize the GPU memory access and minimize the branch divergence of the warp. In addition, a zero-padding method, extending the length of the short time-domain coefficients of the FIR filter to the best size which the FFT library running on the GPU can obtain the best performance, is utilized to improve the algorithm´s performance gain for the short tap length of the FIR filter. The throughput of this algorithm can achieve over 600M samples per second throughput for the host-memory to host-memory on the NVIDIA Tesla M2090 with typical performance improvements of 4 to 6 times over Intel IPP for large chunk size.
  • Keywords
    FIR filters; graphics processing units; FFT library; FIR filtering; GPU memory access; finite impulse response filter; graphics processors; overlapped save method; performance gain; short time-domain coefficients; symmetric segmentation; zero padding method; Algorithm design and analysis; Computer architecture; Finite impulse response filter; Graphics processing units; Instruction sets; Throughput;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Control and Information Processing (ICICIP), 2012 Third International Conference on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4577-2144-1
  • Type

    conf

  • DOI
    10.1109/ICICIP.2012.6391489
  • Filename
    6391489