DocumentCode :
48999
Title :
Sparse Spatio-Temporal Representation With Adaptive Regularized Dictionary Learning for Low Bit-Rate Video Coding
Author :
Hongkai Xiong ; Zhiming Pan ; Xinwei Ye ; Chang Wen Chen
Author_Institution :
Dept. of Electron. Eng., Shanghai Jiao Tong Univ., Shanghai, China
Volume :
23
Issue :
4
fYear :
2013
fDate :
Apr-13
Firstpage :
710
Lastpage :
728
Abstract :
For promising vision-based video coding on low-quality data, this paper proposes a sparse spatio-temporal representation with adaptive regularized dictionary learning and develops a low bit-rate video coding scheme. In a reversed-complexity Wyner-Ziv coding manner, it selects a subset of key frames to code at original resolution, while the rest are down sampled and reconstructed by a sparse spatio-temporal approximation using key frames as a training dataset. Since primitive patches (geometry) are of low dimensionality and can be well learned from the primitive patches across frames in a scale space, a video frame is divided into three layers: a primitive layer, a nonprimitive coarse layer, and a nonprimitive smooth layer. The multiscale differential feature representations are invertible to facilitate reconstruction with dictionary learning, and the target is formulated as an optimization problem by constructing a sparse representation of 2-D patches and 3-D volumes over adaptive regularized dictionaries, a set of 2-D subdictionary pairs trained from primitive patches, and a 3-D dictionary trained from nonprimitive volumes. Specifically, the nonprimitive layer is constructed as volumes in to order keep it consistent along the motion trajectory, which enables sparse representations over a learned 3-D spatio-temporal dictionary. Through hierarchical bidirectional motion estimation and adaptive overlapped block motion compensation, the 3-D low-frequency and high-frequency dictionary pair is designed by the K-SVD algorithm to update the atoms for optimal sparse representation and convergence. In reconstruction, the lost high-frequency information of the down-sampled frames can be synthesized from the sparse spatio-temporal representation over the adaptive regularized dictionaries. Extensive experiments validate the compression efficiency of the proposed scheme versus H.264/AVC in terms of both objective and subjective comparisons.
Keywords :
image representation; motion compensation; motion estimation; video coding; 2-D patches; 3-D dictionary; 3-D volumes; H.264/AVC; K-SVD algorithm; adaptive overlapped block motion compensation; adaptive regularized dictionaries; adaptive regularized dictionary learning; bit-rate video coding scheme; hierarchical bidirectional motion estimation; high-frequency dictionary pair; low bit-rate video coding; multiscale differential feature representations; optimal sparse representation; primitive patches; reversed-complexity Wyner-Ziv coding; sparse representation; sparse spatio-temporal approximation; sparse spatio-temporal representation; vision-based video coding; Dictionaries; Hafnium; Image edge detection; Image reconstruction; Image resolution; Training; Video coding; Atom decomposition; dictionary learning; primitive patch; sparse representation; video coding;
fLanguage :
English
Journal_Title :
Circuits and Systems for Video Technology, IEEE Transactions on
Publisher :
ieee
ISSN :
1051-8215
Type :
jour
DOI :
10.1109/TCSVT.2012.2221271
Filename :
6317158
Link To Document :
بازگشت