DocumentCode :
454831
Title :
A Mid-Level Scene Change Representation Via Audiovisual Alignment
Author :
Wang, Jinqiao ; Duan, Lingyu ; Lu, Hanqing ; Jin, Jesse S. ; Xu, Changsheng
Author_Institution :
Nat. Lab of Pattern Recognition, Chinese Acad. of Sci., Beijing
Volume :
2
fYear :
2006
fDate :
14-19 May 2006
Abstract :
Scene is a series of semantic correlated video shots. An effective scene detection depends on domain knowledge more or less. Most existing approaches try to directly detect various scene changes by applying clustering or supervised learning methods to low level audiovisual features. However, robustly detecting diverse scene changes derived from complex semantic meanings is still a challenging problem. In this paper we are focused on the association of visual signal changes (e.g. cuts, fade-in, fade-out, etc.) and audio signal changes (e.g. speaker change, background music change, etc.) to propose a mid-level scene change representation, which is meant to locate candidate scene change points by characterizing temporally uncorrelated properties of audio and visual track in the case of scene change happening. By incorporating domain knowledge, enhanced features can be further extracted to complement this representation to bridge semantic gap towards scene change detection. We utilize a camera motion estimation algorithm to detect visual signal changes. Such visual change positions are selected as time-stamp points. An alignment is performed to search for candidate audio signal change positions by multi-scale Kullback-Leibler (K-L) distance computing. Both metric-based K-L distance approach and model-based HMM are applied to determine true audio signal changes. The associated visual and audio signal changes are considered as the mid-level scene change representation. This representation has been successfully applied to detect boundaries of individual commercial in TV broadcast stream with an accuracy of around 95%. Particularly the systematic alignment approach can be utilized in video summarization
Keywords :
audio signal processing; image motion analysis; image representation; learning (artificial intelligence); video signal processing; TV broadcast stream; audio signal changes; audiovisual alignment; camera motion estimation algorithm; mid-level scene change representation; multiscale Kullback-Leibler distance computing; scene detection; semantic correlated video shots; supervised learning methods; Bridges; Cameras; Change detection algorithms; Hidden Markov models; Layout; Motion detection; Motion estimation; Multiple signal classification; Robustness; Supervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
Conference_Location :
Toulouse
ISSN :
1520-6149
Print_ISBN :
1-4244-0469-X
Type :
conf
DOI :
10.1109/ICASSP.2006.1660366
Filename :
1660366
Link To Document :
بازگشت