Title :
A panoramic video and acoustic beamforming sensor for videoconferencing
Author :
Fiala, Mark ; Green, David ; Roth, Gerhard
Author_Institution :
Comput. Video Group, Nat. Res. Council, Ottawa, Ont., Canada
Abstract :
Videoconferencing systems in use today typically rely on either fixed or pan/tilt/zoom cameras for image acquisition, and close-talking microphones for good quality audio capture. These sensors are unsuitable for scenarios involving multiple users seated at a meeting table, or non-stationary users. In these situations, the focus of attention should change from one talker to the next, and if possible track moving users. This work describes a multi-modal perception system using both video and audio signals for such a videoconferencing system. An omnidirectional video camera and an audio beamforming array are combined into a device placed in the center of a meeting table. The video and audio is processed to determine the direction of who is talking, a virtual perspective view and directional audio beam is then created. Computer vision algorithms are used to find people by motion and by face and marker detection. The audio beamformer merges the signals from a circular array of microphones to provide audio power measurements in different directions simultaneously. The video and audio cues are combined to make a decision as to the location of the talker. The system has been integrated with OpenH.323 and serves as a node using Microsoft NetMeeting.
Keywords :
acoustic signal processing; array signal processing; computer vision; sensors; teleconferencing; video signal processing; Microsoft NetMeeting; OpenH.323; acoustic beamforming sensor; audio signals; close-talking microphones; computer vision algorithm; image acquisition; multimodal perception system; pan/tilt/zoom camera; panoramic video sensor; video signals; videoconferencing system; Acoustic beams; Acoustic sensors; Array signal processing; Cameras; Computer vision; Face detection; Focusing; Microphone arrays; Motion detection; Teleconferencing;
Conference_Titel :
Haptic, Audio and Visual Environments and Their Applications, 2004. HAVE 2004. Proceedings. The 3rd IEEE International Workshop on
Print_ISBN :
0-7803-8817-8
DOI :
10.1109/HAVE.2004.1391880