Title :
Audiovisual Gestalts
Author :
Monaci, Gianluca ; Vandergheynst, Pierre
Author_Institution :
Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland
Abstract :
This paper presents an algorithm to correlate audio and visual data generated by the same physical phenomenon. According to psychophysical experiments, temporal synchrony strongly contributes to integrate cross-modal information in humans. Thus, we define meaningful audiovisual structures as temporally proximal audio-video events. Audio and video signals are represented as sparse decompositions over redundant dictionaries of functions. In this way, it is possible to define perceptually meaningful audiovisual events. The detection of these cross-modal structures is done using a simple rule called Helmholtz principle. Experimental results show that extracting significant synchronous audiovisual events, we can detect the existing cross-modal correlation between those signals even in presence of distracting motion and acoustic noise. These results confirm that temporal proximity between audiovisual events is a key ingredient for the integration of information across modalities and that it can be effectively exploited for the design of multi-modal analysis algorithms.
Keywords :
Acoustic noise; Acoustic signal detection; Algorithm design and analysis; Data mining; Dictionaries; Event detection; Humans; Information analysis; Motion detection; Psychology;
Conference_Titel :
Computer Vision and Pattern Recognition Workshop, 2006. CVPRW '06. Conference on
Print_ISBN :
0-7695-2646-2
DOI :
10.1109/CVPRW.2006.34