DocumentCode :
1559979
Title :
Factor graph framework for semantic video indexing
Author :
Naphade, Milind Ramesh ; Kozintsev, Igor V. ; Huang, Thomas S.
Author_Institution :
IBM Thomas J. Watson Res. Center, Hawthorne, NY, USA
Volume :
12
Issue :
1
fYear :
2002
fDate :
1/1/2002 12:00:00 AM
Firstpage :
40
Lastpage :
52
Abstract :
Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts
Keywords :
content-based retrieval; database indexing; image retrieval; video databases; video signal processing; detection performance; dynamic multinet; factor graph; high-level semantics; keyword-based query; large video database; low-level features; low-level representation; low-level similarity; movies; multijects network; multimedia understanding problem; multinet; probabilistic multimedia objects; probabilistic pattern-recognition; semantic filtering; semantic keywords; semantic video indexing; spatio-temporal constraints; temporal correlation; video data content access; video data management; video query; Bridges; Computational modeling; Context modeling; Databases; Explosions; Filtering; Forestry; Indexing; Motion pictures; Snow;
fLanguage :
English
Journal_Title :
Circuits and Systems for Video Technology, IEEE Transactions on
Publisher :
ieee
ISSN :
1051-8215
Type :
jour
DOI :
10.1109/76.981844
Filename :
981844
Link To Document :
بازگشت