DocumentCode
1224405
Title
Speech Enhancement and Recognition in Meetings With an Audio–Visual Sensor Array
Author
Maganti, Hari Krishna ; Gatica-Perez, Daniel ; McCowan, Iain
Author_Institution
Inst. of Neural Inf. Process., Univ. of Ulm, Ulm
Volume
15
Issue
8
fYear
2007
Firstpage
2257
Lastpage
2269
Abstract
This paper addresses the problem of distant speech acquisition in multiparty meetings, using multiple microphones and cameras. Microphone array beamforming techniques present a potential alternative to close-talking microphones by providing speech enhancement through spatial filtering. Beamforming techniques, however, rely on knowledge of the speaker location. In this paper, we present an integrated approach, in which an audio-visual multiperson tracker is used to track active speakers with high accuracy. Speech enhancement is then achieved using microphone array beamforming followed by a novel postfiltering stage. Finally, speech recognition is performed to evaluate the quality of the enhanced speech signal. The approach is evaluated on data recorded in a real meeting room for stationary speaker, moving speaker, and overlapping speech scenarios. The results show that the speech enhancement and recognition performance achieved using our approach are significantly better than a single table-top microphone and are comparable to a lapel microphone for some of the scenarios. The results also indicate that the audio-visual-based system performs significantly better than audio-only system, both in terms of enhancement and recognition. This reveals that the accurate speaker tracking provided by the audio-visual sensor array proved beneficial to improve the recognition performance in a microphone array-based speech recognition system.
Keywords
array signal processing; audio signal processing; audio-visual systems; filtering theory; microphone arrays; speaker recognition; speech enhancement; tracking filters; audio-visual multiperson tracker; audio-visual sensor array; distant speech acquisition problem; microphone array beamforming techniques; multiparty meetings; postfiltering stage; speech enhancement; speech recognition; Array signal processing; Cameras; Filtering; Microphone arrays; Performance evaluation; Sensor arrays; Sensor systems; Speech analysis; Speech enhancement; Speech recognition; Audio–visual fusion; microphone array processing; multiobject tracking; speech enhancement; speech recognition;
fLanguage
English
Journal_Title
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher
ieee
ISSN
1558-7916
Type
jour
DOI
10.1109/TASL.2007.906197
Filename
4317572
Link To Document