Multi-view Super Vector for Action Recognition

Author

Zhuowei Cai ; Limin Wang ; Xiaojiang Peng ; Yu Qiao

Author_Institution

Shenzhen Key Lab. of Comput. Vision & Pattern Recognition, Shenzhen Inst. of Adv. Technol., Shenzhen, China

fYear

2014

fDate

23-28 June 2014

Firstpage

596

Lastpage

603

Abstract

Images and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. Recognition systems benefit from fusing multiple types of these descriptors. Two widely applied fusion pipelines are descriptor concatenation and kernel average. The first one is effective when different descriptors are strongly correlated, while the second one is probably better when descriptors are relatively independent. In practice, however, different descriptors are neither fully independent nor fully correlated, and previous fusion methods may not be satisfying. In this paper, we propose a new global representation, Multi-View Super Vector (MVSV), which is composed of relatively independent components derived from a pair of descriptors. Kernel average is then applied on these components to produce recognition result. To obtain MVSV, we develop a generative mixture model of probabilistic canonical correlation analyzers (M-PCCA), and utilize the hidden factors and gradient vectors of M-PCCA to construct MVSV for video representation. Experiments on video based action recognition tasks show that MVSV achieves promising results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy.

Keywords

feature extraction; image motion analysis; image representation; object recognition; statistical analysis; video signal processing; HOF descriptors; HOG descriptors; M-PCCA; MVSV representation; SIFT descriptors; descriptor concatenation; fusion pipelines; generative mixture model of probabilistic canonical correlation analyzers; histogram-of-oriented gradients; kernel average; multiview super vector; object feature; scale invariant feature transforms; video based action recognition tasks; Accuracy; Correlation; Encoding; Kernel; Probabilistic logic; Vectors; Videos; action recognition; canonical correlation analysis; mixture model; multi-view;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on

Conference_Location

Columbus, OH

Type

conf

DOI

10.1109/CVPR.2014.83

Filename

6909477