Enhancing Video Event Recognition Using Automatically Constructed Semantic-Visual Knowledge Base

Author

Xishan Zhang ; Yang Yang ; Yongdong Zhang ; Huanbo Luan ; Jintao Li ; Hanwang Zhang ; Tat-Seng Chua

Author_Institution

Key Lab. of Intell. Inf. Process., Inst. of Comput. Technol., Beijing, China

Volume

17

Issue

9

fYear

2015

fDate

Sept. 2015

Firstpage

1562

Lastpage

1575

Abstract

The task of recognizing events from video has attracted a lot of attention in recent years. However, due to the complex nature of user-defined events, the use of purely audio- visual content analysis without domain knowledge has been found to be grossly inadequate. In this paper, we propose to construct a semantic-visual knowledge base to encode the rich event-centric concepts and their relationships from the well- established lexical databases, including FrameNet, as well as the concept-specific visual knowledge from ImageNet. Based on this semantic-visual knowledge bases, we design an effective system for video event recognition. Specifically, in order to narrow the semantic gap between the high-level complex events and low-level visual representations, we utilize the event-centric semantic concepts encoded in the knowledge base as the intermediate-level event representation, which offers both human-perceivable and machine-interpretable semantic clues for event recognition. In addition, in order to leverage the abundant ImageNet images, we propose a robust transfer learning model to learn the noise- resistant concept classifiers for videos. Extensive experiments on various real-world video datasets demonstrate the superiority of our proposed system as compared to the state-of-the-art approaches.

Keywords

image classification; knowledge based systems; learning (artificial intelligence); video signal processing; FrameNet; ImageNet images; audio-visual content analysis; automatically constructed semantic-visual knowledge base; concept-specific visual knowledge; event-centric semantic concept encoding; high-level complex events; human-perceivable semantic clues; intermediate-level event representation; lexical database; low-level visual representation; machine-interpretable semantic clues; multiple kernel learning algorithm; noise-resistant concept classifier; robust transfer learning model; semantic gap; user-defined events; video event recognition; Feature extraction; Knowledge based systems; Multimedia communication; Semantics; Streaming media; Vehicles; Visualization; Concept detection; event recognition; knowledge base;

fLanguage

English

Journal_Title

Multimedia, IEEE Transactions on

Publisher

ieee

ISSN

1520-9210

Type

jour

DOI

10.1109/TMM.2015.2449660

Filename

7132742