DocumentCode :
2053553
Title :
Improving hands-free speech recognition in a car through audio-visual voice activity detection
Author :
Faubel, Friedrich ; Georges, Munir ; Kumatani, Kenichi ; Bruhn, Andrés ; Klakow, Dietrich
Author_Institution :
Saarland Univ., Saarbrücken, Germany
fYear :
2011
fDate :
May 30 2011-June 1 2011
Firstpage :
70
Lastpage :
75
Abstract :
In this work, we show how the speech recognition performance in a noisy car environment can be improved by combining audio-visual voice activity detection (VAD) with microphone array processing techniques. That is accomplished by enhancing the multi-channel audio signal in the speaker localization step, through per channel power spectral subtraction whose noise estimates are obtained from the non-speech segments identified by VAD. This noise reduction step improves the accuracy of the estimated speaker positions and thereby the quality of the beamformed signal of the consecutive array processing step. Audio-visual voice activity detection has the advantage of being more robust in acoustically demanding environments. This claim is substantiated through speech recognition experiments on the AVICAR corpus, where the proposed localization framework gave a WER of 7.1% in combination with delay-and-sum beamforming. This compares to a WER of 8.9% for speaker localizing with audio-only VAD and 11.6% without VAD and 15.6 for a single distant channel.
Keywords :
acoustic signal detection; audio-visual systems; microphone arrays; speech recognition; AVICAR corpus; acoustic signal detection; audio-visual voice activity detection; delay-and-sum beamforming; hands-free speech recognition; microphone array processing; multichannel audio signal enhancement; noise reduction; noisy car environment; non-speech segments; power spectral subtraction; speaker localization; speaker positions; Feature extraction; Hidden Markov models; Mouth; Noise; Speech; Speech recognition; Visualization; acoustic signal detection; audio-visual systems; automatic speech recognition; microphone arrays; time of arrival estimation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Hands-free Speech Communication and Microphone Arrays (HSCMA), 2011 Joint Workshop on
Conference_Location :
Edinburgh
Print_ISBN :
978-1-4577-0997-5
Type :
conf
DOI :
10.1109/HSCMA.2011.5942412
Filename :
5942412
Link To Document :
بازگشت