DocumentCode :
2971163
Title :
Audiovisual arrays for untethered spoken interfaces
Author :
Wilson, Kevin ; Rangarajan, Vibhav ; Checka, Neal ; Darrell, Trevor
Author_Institution :
Artificial Intelligence Lab., MIT, Cambridge, MA, USA
fYear :
2002
fDate :
2002
Firstpage :
389
Lastpage :
394
Abstract :
When faced with a distant speaker at a known location in a noisy environment, a microphone array can provide a significantly improved audio signal for speech recognition. Estimating the location of a speaker in a reverberant environment from audio information alone can be quite difficult, so we use an array of video cameras to aid localization. Stereo processing techniques are used on pairs of cameras, and foreground 3-D points are grouped to estimate the trajectory of people as they move in an environment. These trajectories are used to guide a microphone array beamformer. Initial results using this system for speech recognition demonstrate increased recognition rates compared to non-array processing techniques.
Keywords :
speech recognition; speech-based user interfaces; stereo image processing; video cameras; audio signal; audiovisual arrays; microphone array; microphone array beamformer; noisy environment; speech recognition; trajectory estimation; untethered spoken interfaces; video cameras; Array signal processing; Cameras; Loudspeakers; Microphone arrays; Radar tracking; Reverberation; Sensor arrays; Speech processing; Speech recognition; Working environment noise;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Multimodal Interfaces, 2002. Proceedings. Fourth IEEE International Conference on
Print_ISBN :
0-7695-1834-6
Type :
conf
DOI :
10.1109/ICMI.2002.1167026
Filename :
1167026
Link To Document :
بازگشت