Title :
Encoding Multiple Audio Objects Using Intra-Object Sparsity
Author :
Maoshen Jia ; Ziyu Yang ; Changchun Bao ; Xiguang Zheng ; Ritz, Christian
Author_Institution :
Sch. of Electron. Inf. & Control Eng., Beijing Univ. of Technol., Beijing, China
Abstract :
Preserving audio scenes in the form of audio objects has become common in recent years. Object-based audio techniques provide more flexibility for personalized rendering as well as a more accurate audio object trajectory. For encoding and transmitting multiple audio objects in a lossy manner, a new compression framework for multiple simultaneously occurring audio objects is presented in this work. The proposed encoding approach is based on the intra-object sparsity (approximate k-sparsity). After establishing a quantitative measure of approximate k-sparsity, statistical analysis is employed to validate the proposed intra-object sparsity of audio objects. By exploring this intra-object sparsity, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. This downmix signal can be further compressed by legacy audio codecs. Meanwhile, the side information is transmitted in a lossless manner. The objective and subjective evaluations revealed that the proposed compression framework achieved better perceptual quality compared to an existing technique where up to eight audio objects are considered. The subjective evaluations also confirmed that the proposed approach is able to achieve scalable transmission according to the bandwidth while preserving the perceptual quality of both the individual audio objects and the spatial audio scenes.
Keywords :
audio coding; codecs; statistical analysis; approximate-sparsity; audio object trajectory; compression framework; intra-object sparsity; legacy audio codecs; mono downmix signal; multiple audio object encoding; objective evaluations; perceptual quality; personalized rendering; side information; spatial audio scenes; statistical analysis; subjective evaluations; Encoding; IEEE transactions; Image coding; Instruments; Speech; Speech processing; Time-frequency analysis; Audio object coding; multichannel audio compression; sparsity;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2015.2419980