Title :
Energy and area-efficient hardware implementation of HEVC inverse transform and dequantization
Author :
Tikekar, Mehul ; Chao-Tsung Huang ; Sze, Vivienne ; Chandrakasan, Anantha
Author_Institution :
Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
High Efficiency Video Coding (HEVC) inverse transform for residual coding uses 2-D 4×4 to 32×32 transforms with higher precision as compared to H.264/AVC´s 4×4 and 8×8 transforms resulting in an increased hardware complexity. In this paper, an energy and area-efficient VLSI architecture of an HEVC-compliant inverse transform and dequantization engine is presented. We implement a pipelining scheme to process all transform sizes at a minimum throughput of 2 pixel/cycle with zero-column skipping for improved throughput. We use data-gating in the 1-D Inverse Discrete Cosine Transform engine to improve energy-efficiency for smaller transform sizes. A high-density SRAM-based transpose memory is used for an area-efficient design. This design supports decoding of 4K Ultra-HD (3840×2160) video at 30 frame/sec. The inverse transform engine takes 98.1 kgate logic, 16.4 kbit SRAM and 10.82 pJ/pixel while the dequantization engine takes 27.7 kgate logic, 8.2 kbit SRAM and 1.10 pJ/pixel in 40 nm CMOS technology. Although larger transforms require more computation per coefficient, they typically contain a smaller proportion of non-zero coefficients. Due to this trade-off, larger transforms can be more energy-efficient.
Keywords :
CMOS memory circuits; SRAM chips; VLSI; data compression; decoding; discrete cosine transforms; energy conservation; integrated circuit design; inverse transforms; power aware computing; quantisation (signal); video coding; 1D inverse discrete cosine transform engine; 4K ultra-HD video decoding; CMOS technology; H.264/AVC; HEVC dequantization engine; HEVC-compliant inverse transform; area-efficient VLSI architecture; area-efficient hardware implementation; data-gating; energy-efficient VLSI architecture; energy-efficient hardware implementation; hardware complexity; high efficiency video coding; high-density SRAM-based transpose memory; nonzero coefficients; pipelining scheme; residual coding; size 40 nm; storage capacity 16.4 Kbit; storage capacity 8.2 Kbit; transform sizes; zero-column skipping; Engines; Laplace equations; Pipeline processing; Random access memory; Throughput; Transforms; Video coding; Data Gating; HEVC; Inverse Discrete Cosine Transform; Transpose Memory;
Conference_Titel :
Image Processing (ICIP), 2014 IEEE International Conference on
Conference_Location :
Paris
DOI :
10.1109/ICIP.2014.7025421