Spatial speech coding for multi-teleconferencing

Author

Phua, Kok Soon ; Gan, Woon Seng

Author_Institution

Sch. of Electr. & Electron. Eng., Nanyang Technol. Univ., Singapore

Volume

1

fYear

1999

fDate

1999

Firstpage

313

Abstract

This paper describes a structural model for the implementation of multichannel speech coding for teleconferencing with spatial audio reproduction. Multiple nonaural speech sources are synthesized into binaural sound to produce a more realistic videoconferencing environment. The activity information of the individual binaural speech, which is determined by the voice activity detection algorithm, is used to calculate two weighting factors prior to mixing. Furthermore, a third level of weight adjustment can be carried out by adjusting these weighting factors before applying to the individual voice source. A scheme to remove undesirable noise spikes is also introduced. Both channels are then coded using the G.723.1 speech codec individually

Keywords

sound reproduction; speech codecs; speech coding; teleconferencing; G.723.1 speech codec; binaural sound; cocktail party effect; multichannel speech coding; multiple nonaural speech sources; noise spikes removal; spatial audio reproduction; structural model; teleconferencing; videoconferencing environment; voice activity detection algorithm; weighting factors; Detection algorithms; Electronic mail; Gallium nitride; Pulse modulation; Speech codecs; Speech coding; Speech processing; Speech synthesis; Telecommunication standards; Teleconferencing;

fLanguage

English

Publisher

ieee

Conference_Titel

TENCON 99. Proceedings of the IEEE Region 10 Conference

Conference_Location

Cheju Island

Print_ISBN

0-7803-5739-6

Type

conf

DOI

10.1109/TENCON.1999.818413

Filename

818413