Title :
Binaural cue coding-Part I: psychoacoustic fundamentals and design principles
Author :
Baumgarte, Frank ; Faller, Christof
Author_Institution :
Media Signal Process. Res. Dept., Agere Syst., Allentown, PA, USA
Abstract :
Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.
Keywords :
audio coding; channel bank filters; fast Fourier transforms; hearing; speech codecs; FFT-based synthesizer; audio channel; audio coding; auditory filter bank; auditory scene synthesis; binaural cue coding; binaural perception phenomena; binaural source localization; cochlear filter bank; fast Fourier transform; flexible spatial rendering; multichannel encoder; multichannel spatial rendering; psychoacoustic design; psychoacoustic fundamentals; quantization; stereophonic audio signal; Decoding; Filter bank; Psychology; Quality assessment; Quantization; Robustness; Signal analysis; Signal synthesis; Speech synthesis; Synthesizers;
Journal_Title :
Speech and Audio Processing, IEEE Transactions on
DOI :
10.1109/TSA.2003.818109