DocumentCode :
2482725
Title :
Audio-visual event classification via spatial-temporal-audio words
Author :
Cao, Yu ; Baang, Sung ; Liu, Shih-Hsi Alex ; Li, Ming ; Hu, Sanqing
Author_Institution :
Dept. of Comput. Sci., California State Univ., Fresno, CA
fYear :
2008
fDate :
8-11 Dec. 2008
Firstpage :
1
Lastpage :
5
Abstract :
In this paper, we propose a generative model-based approach for audio-visual event classification. This approach is based on a new unsupervised learning method using an extended probabilistic latent semantic analysis (pLSA) model. We represent each video clip as a collection of spatial-temporal-audio words, which are generated by fusing the visual and audio features using the pLSA model. Each audio-visual event class is treated as the latent topic in this model. The probability distributions of the spatial-temporal-audio words are learnt from training examples, which include a sequence of videos that represent different types of audio-visual events. Experimental results show the effectiveness of the proposed approach.
Keywords :
audio signal processing; image classification; image fusion; image representation; image sequences; probability; semantic networks; spatiotemporal phenomena; unsupervised learning; audio-visual event classification; image fusion; probabilistic latent semantic analysis; probability distribution; spatial-temporal-audio word; unsupervised learning method; video clip representation; Computer science; Graphical models; Humans; Nervous system; Probability distribution; Support vector machine classification; Support vector machines; Surveillance; Unsupervised learning; Video sequences;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
ISSN :
1051-4651
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2008.4761474
Filename :
4761474
Link To Document :
بازگشت