DocumentCode
253652
Title
Multi-view Super Vector for Action Recognition
Author
Zhuowei Cai ; Limin Wang ; Xiaojiang Peng ; Yu Qiao
Author_Institution
Shenzhen Key Lab. of Comput. Vision & Pattern Recognition, Shenzhen Inst. of Adv. Technol., Shenzhen, China
fYear
2014
fDate
23-28 June 2014
Firstpage
596
Lastpage
603
Abstract
Images and videos are often characterized by multiple types of local descriptors such as SIFT, HOG and HOF, each of which describes certain aspects of object feature. Recognition systems benefit from fusing multiple types of these descriptors. Two widely applied fusion pipelines are descriptor concatenation and kernel average. The first one is effective when different descriptors are strongly correlated, while the second one is probably better when descriptors are relatively independent. In practice, however, different descriptors are neither fully independent nor fully correlated, and previous fusion methods may not be satisfying. In this paper, we propose a new global representation, Multi-View Super Vector (MVSV), which is composed of relatively independent components derived from a pair of descriptors. Kernel average is then applied on these components to produce recognition result. To obtain MVSV, we develop a generative mixture model of probabilistic canonical correlation analyzers (M-PCCA), and utilize the hidden factors and gradient vectors of M-PCCA to construct MVSV for video representation. Experiments on video based action recognition tasks show that MVSV achieves promising results, and outperforms FV and VLAD with descriptor concatenation or kernel average fusion strategy.
Keywords
feature extraction; image motion analysis; image representation; object recognition; statistical analysis; video signal processing; HOF descriptors; HOG descriptors; M-PCCA; MVSV representation; SIFT descriptors; descriptor concatenation; fusion pipelines; generative mixture model of probabilistic canonical correlation analyzers; histogram-of-oriented gradients; kernel average; multiview super vector; object feature; scale invariant feature transforms; video based action recognition tasks; Accuracy; Correlation; Encoding; Kernel; Probabilistic logic; Vectors; Videos; action recognition; canonical correlation analysis; mixture model; multi-view;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location
Columbus, OH
Type
conf
DOI
10.1109/CVPR.2014.83
Filename
6909477
Link To Document