DocumentCode :
2008093
Title :
Active audio-visual integration for Voice Activity Detection based on a Causal Bayesian Network
Author :
Yoshida, Takafumi ; Nakadai, Kazuhiro
Author_Institution :
Grad. Sch. of Inf. Sci. & Eng., Tokyo Inst. of Technol., Tokyo, Japan
fYear :
2012
fDate :
Nov. 29 2012-Dec. 1 2012
Firstpage :
370
Lastpage :
375
Abstract :
This paper addresses an active audio-visual integration framework which integrates audio and visual information with a robot´s active motion for noise-robust Voice Activity Detection (VAD). VAD is crucial for noise robust Automatic Speech Recognition (ASR) because speech captured by a robot´s microphones is usually contaminated with other noise sources. To realize such noise-robust VAD, we propose Active Audio-Visual (AAV) integration framework which integrates auditory, visual and motion information using a Causal Bayesian Network (CBN). CBN is a subclass of Bayesian networks, which is able to estimate the effect on VAD performance caused by active motions. Since CBN is a general framework for information integration, we can naturally introduce various types of information such as the location of a speaker and a noise source which affect VAD performance to CBN, and CBN selects the optimal active motion for better perception of the robot using “intervention” mechanism in CBN. We implemented a prototype system based on the proposed framework on a humanoid robot called Hearbo. The proposed AAV-VAD is compared with three types of AV-VAD; simple AAV-VAD, multi-regression-based AAV-VAD, and stationary (not active) AV-VAD. A preliminary experiment using the prototype system showed that the VAD performance of the proposed AV-VAD was 14.4, 26.0, and 30.3 points higher than that of the simple active, multi-regression-based active, and stationary AV-VAD, respectively.
Keywords :
belief networks; human-robot interaction; humanoid robots; robot vision; speech recognition; AAV integration framework; ASR; CBN; Hearbo humanoid robot; VAD; active audio-visual integration; active audio-visual integration framework; auditory information; automatic speech recognition; causal Bayesian network; motion information; multiregression-based active AV-VAD; noise-robust VAD; noise-robust voice activity detection; simple active AV-VAD; stationary AV-VAD; visual information; voice activity detection; Microphones; Robots; Robustness; Speech;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Humanoid Robots (Humanoids), 2012 12th IEEE-RAS International Conference on
Conference_Location :
Osaka
ISSN :
2164-0572
Type :
conf
DOI :
10.1109/HUMANOIDS.2012.6651546
Filename :
6651546
Link To Document :
بازگشت