مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

178700

Title :

Gaze-enhanced speech recognition

Author :

Slaney, M. ; Rajan, Radha ; Stolcke, Andreas ; Parthasarathy, Partha

Author_Institution :

Microsoft Corp., Mountain View, CA, USA

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

3236

Lastpage :

3240

Abstract :

This work demonstrates through simulations and experimental work the potential of eye-gaze data to improve speech-recognition results. Multimodal interfaces, where users see information on a display and use their voice to control an interaction, are of growing importance as mobile phones and tablets grow in popularity. We demonstrate an improvement in speech-recognition performance, as measured by word error rate, by rescoring the output from a large-vocabulary speech-recognition system. We use eye-gaze data as a spotlight and collect bigram word statistics near to where the user looks in time and space. We see a 25% relative reduction in the word-error rate over a generic language model, and approximately a 10% reduction in errors over a strong, page-specific baseline language model.

Keywords :

mobile handsets; speech recognition; eye-gaze data; gaze-enhanced speech recognition; generic language model; large-vocabulary speech-recognition system; mobile phones; multimodal interfaces; tablets; Acoustics; Error analysis; Interpolation; Noise; Speech; Speech recognition; Visualization; Eye Gaze; Pointing; Speech Recognition;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6854198

Filename :

6854198

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=178700