• DocumentCode
    738162
  • Title

    Fast Motion Estimation Algorithm and Design for Real Time QFHD High Efficiency Video Coding

  • Author

    Shiaw-Yu Jou ; Shan-Jung Chang ; Tian-Sheuan Chang

  • Author_Institution
    PixArt, Hsinchu, Taiwan
  • Volume
    25
  • Issue
    9
  • fYear
    2015
  • Firstpage
    1533
  • Lastpage
    1544
  • Abstract
    Motion estimation (ME) in the latest High Efficiency Video Coding standard adopts the quadtree coding structure and up to a 64 × 64 prediction unit (PU) size to improve the coding gain. However, these techniques also have serious design problems regarding the complexity, data dependency, external memory bandwidth, and on-chip buffer size compared with previous standards, especially for real-time ultrahigh-definition video coding. To solve these problems, this paper proposes an efficient ME design with a joint algorithm and architecture optimization. To reduce complexity, we propose a predictive integer ME (IME) algorithm that selects the most probable search directions and steps through a statistical analysis to reduce the number of search points by 90.5%. We also employ a PU size-dependent fractional ME (FME) algorithm to reduce the interpolation filtering by 62.4% compared with the reference software. To resolve the corresponding dependency, we cascade the IME and FME computations via interlaced scheduling and propose an early motion vector prediction candidate approach. We use this scheduling with a 16 × 16 processing unit to compute the partial matching cost of all PUs with the same 16 × 16 current block in an interlaced order and share their common reference block to reduce the on-chip buffer size and off-chip memory bandwidth. The bandwidth is further reduced by a cache with double Z scan indexed addressing to simplify the cache controller. Implementation with a Taiwan Semiconductor Manufacturing Company 90-nm CMOS process supports the real-time encoding of 4 K × 2 K at 60 frames/s operated at 270 MHz with 778.7k logic gates and 17.4 KB of on-chip memory.
  • Keywords
    CMOS integrated circuits; VLSI; cache storage; filtering theory; integrated circuit design; interpolation; logic gates; motion estimation; quadtrees; search problems; statistical analysis; video coding; IME algorithm; PU size-dependent fractional ME algorithm; Taiwan Semiconductor Manufacturing Company 90-nm CMOS process; cache controller; coding gain improvement; complexity reduction; data dependency; design problems; early motion vector prediction candidate approach; external memory bandwidth; fast motion estimation algorithm; interlaced scheduling; interpolation filtering reduction; joint algorithm optimization; joint architecture optimization; logic gates; most probable search directions; off-chip memory bandwidth reduction; on-chip buffer size reduction; partial matching cost computation; prediction unit size; predictive integer ME algorithm; processing unit; quadtree coding structure; real time QFHD high efficiency video coding; statistical analysis; Algorithm design and analysis; Bandwidth; Complexity theory; Encoding; Prediction algorithms; Standards; System-on-chip; HEVC; High Efficiency Video Coding (HEVC); Motion estimation; VLSI architecture; motion estimation (ME); very-large-scale integration (VLSI) architecture;
  • fLanguage
    English
  • Journal_Title
    Circuits and Systems for Video Technology, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1051-8215
  • Type

    jour

  • DOI
    10.1109/TCSVT.2015.2389472
  • Filename
    7005433