Gaze-contingent asr for spontaneous, conversational speech: An evaluation

Author

Cooke, Neil ; Russell, Martin

Author_Institution

Multi-modal Interaction Lab., Birmingham Univ., Birmingham

fYear

2008

fDate

March 31 2008-April 4 2008

Firstpage

4433

Lastpage

4436

Abstract

There has been little work that attempts to improve the recognition of spontaneous, conversational speech by adding information from a loosely-coupled modality. This study investigated this idea by integrating information from gaze into an ASR system. A probabilistic framework for multimodal recognition was formalised and applied to the specific case of integrating gaze and speech. Gaze-contingent ASR systems were developed from a baseline ASR system by redistributing language model probability mass according to the visual attention. The best performing systems had similar Word Error Rates to the baseline ASR system and showed an increase in keyword spotting accuracy. The key finding was that performance improvements observed were due to increased recognition accuracy for words associated with the visual field but not the current focus of visual attention.

Keywords

speech recognition; word processing; automatic speech recognition; gaze-contingent ASR; keyword spotting accuracy; language model probability mass; loosely-coupled modality; spontaneous conversational speech; visual attention; word error rates; Automatic speech recognition; Error analysis; Human computer interaction; Laboratories; Maximum likelihood decoding; Speech analysis; Speech recognition; User interfaces; Visual system; Vocabulary; Bayes procedures; Speech recognition; User interfaces; Visual system;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location

Las Vegas, NV

ISSN

1520-6149

Print_ISBN

978-1-4244-1483-3

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2008.4518639

Filename

4518639