An audiovisual attention model for natural conversation scenes

Author

Coutrot, Antoine ; Guyader, Nathalie

Author_Institution

Gipsa-Lab., Grenoble-Alpes Univ., Grenoble, France

fYear

2014

fDate

27-30 Oct. 2014

Firstpage

1100

Lastpage

1104

Abstract

Classical visual attention models neither consider social cues, such as faces, nor auditory cues, such as speech. However, faces are known to capture visual attention more than any other visual features, and recent studies showed that speech turn-taking affects the gaze of non-involved viewers. In this paper, we propose an audiovisual saliency model able to predict the eye movements of observers viewing other people having a conversation. Thanks to a speaker diarization algorithm, our audiovisual saliency model increases the saliency of the speakers compared to the addressees. We evaluated our model with eye-tracking data, and found that it significantly outperforms visual attention models using an equal and constant saliency value for all faces.

Keywords

audio-visual systems; gaze tracking; image processing; speech processing; audiovisual attention model; audiovisual saliency model; classical visual attention models; eye movement prediction; eye-tracking data; natural conversation scenes; social cues; speaker diarization algorithm; Computational modeling; Feature extraction; Observers; Predictive models; Speech; Videos; Visualization; audiovisual saliency model; eye movements; social gaze; speaker diarization; speech;

fLanguage

English

Publisher

ieee

Conference_Titel

Image Processing (ICIP), 2014 IEEE International Conference on

Conference_Location

Paris

Type

conf

DOI

10.1109/ICIP.2014.7025219

Filename

7025219