• DocumentCode
    29517
  • Title

    What/Where to Look Next? Modeling Top-Down Visual Attention in Complex Interactive Environments

  • Author

    Borji, Ali ; Sihite, Dicky N. ; Itti, Laurent

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Southern California, Los Angeles, CA, USA
  • Volume
    44
  • Issue
    5
  • fYear
    2014
  • fDate
    May-14
  • Firstpage
    523
  • Lastpage
    538
  • Abstract
    Several visual attention models have been proposed for describing eye movements over simple stimuli and tasks such as free viewing or visual search. Yet, to date, there exists no computational framework that can reliably mimic human gaze behavior in more complex environments and tasks such as urban driving. In addition, benchmark datasets, scoring techniques, and top-down model architectures are not yet well understood. In this paper, we describe new task-dependent approaches for modeling top-down overt visual attention based on graphical models for probabilistic inference and reasoning. We describe a dynamic Bayesian network that infers probability distributions over attended objects and spatial locations directly from observed data. Probabilistic inference in our model is performed over object-related functions that are fed from manual annotations of objects in video scenes or by state-of-the-art object detection/recognition algorithms. Evaluating over approximately 3 h (approximately 315 000 eye fixations and 12 000 saccades) of observers playing three video games (time-scheduling, driving, and flight combat), we show that our approach is significantly more predictive of eye fixations compared to: 1) simpler classifier-based models also developed here that map a signature of a scene (multimodal information from gist, bottom-up saliency, physical actions, and events) to eye positions; 2) 14 state-of-the-art bottom-up saliency models; and 3) brute-force algorithms such as mean eye position. Our results show that the proposed model is more effective in employing and reasoning over spatio-temporal visual data compared with the state-of-the-art.
  • Keywords
    behavioural sciences computing; belief networks; image classification; inference mechanisms; object detection; object recognition; statistical distributions; bottom-up saliency models; brute-force algorithms; complex interactive environments; dynamic Bayesian network; eye movements; free viewing task; graphical models; human gaze behavior; object detection algorithm; object recognition algorithm; object-related functions; probabilistic inference; probability distributions; reasoning; scoring techniques; simpler classifier-based models; spatio-temporal visual data; task-dependent approach; top-down model architectures; top-down overt visual attention modeling; urban driving task; video scenes; visual search task; Analytical models; Computational modeling; Games; Hidden Markov models; Predictive models; Solid modeling; Visualization; Bottom-up saliency; complex natural scenes; eye movement prediction; gaze prediction; interactive environments; task-driven attention; top-down attention; visual attention;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics: Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    2168-2216
  • Type

    jour

  • DOI
    10.1109/TSMC.2013.2279715
  • Filename
    6613519