• DocumentCode
    1018847
  • Title

    Generic Multiphase Software Pipelined Partial FFT on Instruction Level Parallel Architectures

  • Author

    Li, Min ; Novo, David ; Bougard, Bruno ; Carlson, Trevor ; Van der Perre, Liesbet ; Catthoor, Francky

  • Author_Institution
    ESAT, K.U. Leuven, Leuven
  • Volume
    57
  • Issue
    4
  • fYear
    2009
  • fDate
    4/1/2009 12:00:00 AM
  • Firstpage
    1604
  • Lastpage
    1615
  • Abstract
    The partial fast Fourier transform (PFFT) is an extended fast Fourier transformation (FFT) where only part of the input or output bins are used. By pruning useless data flow, it is possible to achieve a significant speedup in many important applications. Although theoretical aspects of the PFFT have been thoroughly studied in the past three decades, efficient and generic implementations were rarely reported. The most important obstacle for the optimization of the PFFT is the highly irregular data flow and the associated control flow. In addition, a size-N PFFT has 2N possibilities of data flow patterns, so finding a flexible but efficient implementation is very challenging. Our contribution is a generic method to map the highly irregular data flow of an arbitrary PFFT onto instruction level parallel architectures using software pipelining. By leveraging the algorithmic level flexibilities in a FFT, we select an appropriate data flow variant that enables aggressive optimizations in implementation schemes. Then, we apply a divide and conquer strategy, partitioning the PFFT into three phases. For each phase, we introduce specialized control structures, loop structures, address generation schemes and memory operations. This reduces cycle count, number of executed instructions and memory accesses. By studying ten representative benchmarks from wireless baseband applications, we are able to produce repeatable and successful results on the TMS320C6000. When comparing to two optimized FFT implementations, our work reduces the cycle count by 20.5% to 87.5%, executed instructions by 11.2% to 86.5% and L1D and L1P cache accesses by 16.1% to 79.4% and 19.5% to 87.1% respectively. To the best of our knowledge, this is the first reported work about a generic software pipelined PFFT for instruction level parallel architectures.
  • Keywords
    data flow computing; fast Fourier transforms; instruction sets; parallel architectures; pipeline processing; TMS320C6000; data flow patterns; divide and conquer strategy; generic method; generic multiphase software; instruction level parallel architectures; partial FFT; partial fast Fourier transform; software pipelining; FFT; ILP; OFDMA; PFFT; VLIW;
  • fLanguage
    English
  • Journal_Title
    Signal Processing, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1053-587X
  • Type

    jour

  • DOI
    10.1109/TSP.2008.2010422
  • Filename
    4695944