A model of attention-driven scene analysis

Author

Slaney, M. ; Agus, T. ; Shih-Chii Liu ; Kaya, M. ; Elhilali, Mounya

fYear

2012

fDate

25-30 March 2012

Firstpage

145

Lastpage

148

Abstract

Parsing complex acoustic scenes involves an intricate interplay between bottom-up, stimulus-driven salient elements in the scene with top-down, goal-directed, mechanisms that shift our attention to particular parts of the scene. Here, we present a framework for exploring the interaction between these two processes in a simulated cocktail party setting. The model shows improved digit recognition in a multi-talker environment with a goal of tracking the source uttering the highest value. This work highlights the relevance of both data-driven and goal-driven processes in tackling real multi-talker, multi-source sound analysis.

Keywords

audio signal processing; cognition; hearing; attention-driven scene analysis; bottom-up stimulus-driven salient elements; cocktail party setting; data-driven processes; digit recognition; goal-directed mechanisms; goal-driven processes; multisource sound analysis; multitalker environment; parsing complex acoustic scenes; tackling real multitalker; top-down mechanisms; Analytical models; Brain modeling; Cognition; Image analysis; Speech; Speech recognition; Switches; Attention; Auditory Scene Analysis; Cognition; Digit Recognition; Saliency;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location

Kyoto

ISSN

1520-6149

Print_ISBN

978-1-4673-0045-2

Electronic_ISBN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2012.6287838

Filename

6287838