Combining the Right Features for Complex Event Recognition

Author

Tang, Ke ; Bangpeng Yao ; Li Fei-Fei ; Koller, Daphne

Author_Institution

Comput. Sci. Dept., Stanford Univ., Stanford, CA, USA

fYear

2013

fDate

1-8 Dec. 2013

Firstpage

2696

Lastpage

2703

Abstract

In this paper, we tackle the problem of combining features extracted from video for complex event recognition. Feature combination is an especially relevant task in video data, as there are many features we can extract, ranging from image features computed from individual frames to video features that take temporal information into account. To combine features effectively, we propose a method that is able to be selective of different subsets of features, as some features or feature combinations may be uninformative for certain classes. We introduce a hierarchical method for combining features based on the AND/OR graph structure, where nodes in the graph represent combinations of different sets of features. Our method automatically learns the structure of the AND/OR graph using score-based structure learning, and we introduce an inference procedure that is able to efficiently compute structure scores. We present promising results and analysis on the difficult and large-scale 2011 TRECVID Multimedia Event Detection dataset.

Keywords

feature extraction; learning (artificial intelligence); video signal processing; AND-OR graph structure; TRECVID multimedia event detection dataset; complex event recognition; features extraction; hierarchical method; inference procedure; score-based structure learning; temporal information; video data; video features; Animals; Feature extraction; Histograms; Image color analysis; Kernel; TV; Training; Complex Event Recognition; Feature Combination;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision (ICCV), 2013 IEEE International Conference on

Conference_Location

Sydney, NSW

ISSN

1550-5499

Type

conf

DOI

10.1109/ICCV.2013.335

Filename

6751446