Audio-visual event detection using duration dependent input output Markov models

Author

Naphade, Milind R. ; Garg, Ashutosh ; Huang, Thomas S.

Author_Institution

IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA

fYear

2001

fDate

2001

Firstpage

39

Lastpage

43

Abstract

Analysis of audio-visual data and detection of semantic events with spatio-temporal support is a challenging multimedia understanding problem. The difficulty lies in the gap that exists between low level media features and high level semantic concept. We introduce a duration dependent input output Markov model (DDIOMM) to detect events based on multiple modalities. The DDIOMM combines the ability to model non-exponential duration densities with the mapping of input sequences to output sequences. We test the DDIOMM by modelling the audio-visual event explosion. We compare the detection performance of the DDIOMM with the IOMM as well as the HMM. Experiments reveal that modeling of duration improves detection performance

Keywords

Markov processes; audio-visual systems; feature extraction; multimedia systems; DDIOMM; HMM; audio-visual data analysis; audio-visual event detection; audio-visual event explosion; detection performance; duration dependent input output Markov models; high level semantic concept; input sequences; low level media features; multimedia understanding problem; multiple modalities; nonexponential duration densities; output sequences; semantic event detection; spatio-temporal support; Bayesian methods; Data analysis; Data mining; Event detection; Explosions; Fellows; Hidden Markov models; Motion pictures; Streaming media; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Content-Based Access of Image and Video Libraries, 2001. (CBAIVL 2001). IEEE Workshop on

Conference_Location

Kauai, HI

Print_ISBN

0-7695-1354-9

Type

conf

DOI

10.1109/IVL.2001.990854

Filename

990854