• DocumentCode
    1475406
  • Title

    Improved SIMD Architecture for High Performance Video Processors

  • Author

    Lo, Wing-Yee ; Lun, Daniel Pak-Kong ; Siu, Wan-chi ; Wang, Wendong ; Song, Jiqiang

  • Author_Institution
    Dept. of Electron. & Inf. Eng., Hong Kong Polytech. Univ., Kowloon, China
  • Volume
    21
  • Issue
    12
  • fYear
    2011
  • Firstpage
    1769
  • Lastpage
    1783
  • Abstract
    Single instruction multiple data (SIMD) execution is in no doubt an efficient way to exploit the data level parallelism in image and video applications. However, SIMD execution bottlenecks must be tackled in order to achieve high execution efficiency. We first analyze in this paper the implementation of two major kernel functions of H.264/AVC namely, SATD and subpel interpolation, in conventional SIMD architectures to identify the bottlenecks in traditional approaches. Based on the analysis results, we propose a new SIMD architecture with two novel features: 1) parallel memory structure with variable block size and word length support, and 2) configurable SIMD structure. The proposed parallel memory structure allows great flexibility for programmers to perform data access of different block sizes and different word lengths. The configurable SIMD structure allows almost “random” register file access and slightly different operations in ALUs inside SIMD. The new features greatly benefit the realization of H.264/AVC kernel functions. For instance, the fractional motion estimation, particularly the half to quarter pixel interpolation, can now be executed with minimal or no additional memory access. When comparing with the conventional SIMD systems, the proposed SIMD architecture can have a further speedup of 2.1X to 4.6X when implementing H.264/AVC kernel functions. Based on Amdahl´s law, the overall speedup of H.264/AVC encoding application can be projected to be 2.46X. We expect significant improvement can also be achieved when applying the proposed architecture to other image and video processing applications.
  • Keywords
    information retrieval; memory architecture; parallel architectures; reconfigurable architectures; video codecs; ALU; Amdahl law; H.264-AVC encoding application; H.264-AVC kernel function; SATD; configurable SIMD structure; data access; data level parallelism; fractional motion estimation; high execution efficiency; high performance video processor; image application; improved SIMD architecture; parallel memory structure; quarter pixel interpolation; random register file access; single instruction multiple data execution; subpel interpolation; video application; word length support; Memory architecture; Parallel processing; Video codecs; Video signal processing; Configurable SIMD; SIMD bottlenecks; parallel memory structure; video codec processor;
  • fLanguage
    English
  • Journal_Title
    Circuits and Systems for Video Technology, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1051-8215
  • Type

    jour

  • DOI
    10.1109/TCSVT.2011.2130250
  • Filename
    5734815