Factor graph framework for semantic video indexing

Author

Naphade, Milind Ramesh ; Kozintsev, Igor V. ; Huang, Thomas S.

Author_Institution

IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA

Volume

12

Issue

1

fYear

2002

fDate

1/1/2002 12:00:00 AM

Firstpage

40

Lastpage

52

Abstract

Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts

Keywords

content-based retrieval; database indexing; image retrieval; video databases; video signal processing; detection performance; dynamic multinet; factor graph; high-level semantics; keyword-based query; large video database; low-level features; low-level representation; low-level similarity; movies; multijects network; multimedia understanding problem; multinet; probabilistic multimedia objects; probabilistic pattern-recognition; semantic filtering; semantic keywords; semantic video indexing; spatio-temporal constraints; temporal correlation; video data content access; video data management; video query; Bridges; Computational modeling; Context modeling; Databases; Explosions; Filtering; Forestry; Indexing; Motion pictures; Snow;

fLanguage

English

Journal_Title

Circuits and Systems for Video Technology, IEEE Transactions on

Publisher

ieee

ISSN

1051-8215

Type

jour

DOI

10.1109/76.981844

Filename

981844