• DocumentCode
    178700
  • Title

    Gaze-enhanced speech recognition

  • Author

    Slaney, M. ; Rajan, Radha ; Stolcke, Andreas ; Parthasarathy, Partha

  • Author_Institution
    Microsoft Corp., Mountain View, CA, USA
  • fYear
    2014
  • fDate
    4-9 May 2014
  • Firstpage
    3236
  • Lastpage
    3240
  • Abstract
    This work demonstrates through simulations and experimental work the potential of eye-gaze data to improve speech-recognition results. Multimodal interfaces, where users see information on a display and use their voice to control an interaction, are of growing importance as mobile phones and tablets grow in popularity. We demonstrate an improvement in speech-recognition performance, as measured by word error rate, by rescoring the output from a large-vocabulary speech-recognition system. We use eye-gaze data as a spotlight and collect bigram word statistics near to where the user looks in time and space. We see a 25% relative reduction in the word-error rate over a generic language model, and approximately a 10% reduction in errors over a strong, page-specific baseline language model.
  • Keywords
    mobile handsets; speech recognition; eye-gaze data; gaze-enhanced speech recognition; generic language model; large-vocabulary speech-recognition system; mobile phones; multimodal interfaces; tablets; Acoustics; Error analysis; Interpolation; Noise; Speech; Speech recognition; Visualization; Eye Gaze; Pointing; Speech Recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
  • Conference_Location
    Florence
  • Type

    conf

  • DOI
    10.1109/ICASSP.2014.6854198
  • Filename
    6854198