DocumentCode :
178700
Title :
Gaze-enhanced speech recognition
Author :
Slaney, M. ; Rajan, Radha ; Stolcke, Andreas ; Parthasarathy, Partha
Author_Institution :
Microsoft Corp., Mountain View, CA, USA
fYear :
2014
fDate :
4-9 May 2014
Firstpage :
3236
Lastpage :
3240
Abstract :
This work demonstrates through simulations and experimental work the potential of eye-gaze data to improve speech-recognition results. Multimodal interfaces, where users see information on a display and use their voice to control an interaction, are of growing importance as mobile phones and tablets grow in popularity. We demonstrate an improvement in speech-recognition performance, as measured by word error rate, by rescoring the output from a large-vocabulary speech-recognition system. We use eye-gaze data as a spotlight and collect bigram word statistics near to where the user looks in time and space. We see a 25% relative reduction in the word-error rate over a generic language model, and approximately a 10% reduction in errors over a strong, page-specific baseline language model.
Keywords :
mobile handsets; speech recognition; eye-gaze data; gaze-enhanced speech recognition; generic language model; large-vocabulary speech-recognition system; mobile phones; multimodal interfaces; tablets; Acoustics; Error analysis; Interpolation; Noise; Speech; Speech recognition; Visualization; Eye Gaze; Pointing; Speech Recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
Type :
conf
DOI :
10.1109/ICASSP.2014.6854198
Filename :
6854198
Link To Document :
بازگشت