Feature-Independent Action Spotting without Human Localization, Segmentation, or Frame-wise Tracking

Author

Chuan Sun ; Tappen, Marshall ; Foroosh, H.

Author_Institution

Dept. of EECS, Univ. of Central Florida, Orlando, FL, USA

fYear

2014

fDate

23-28 June 2014

Firstpage

2689

Lastpage

2696

Abstract

In this paper, we propose an unsupervised framework for action spotting in videos that does not depend on any specific feature (e.g. HOG/HOF, STIP, silhouette, bag-of-words, etc.). Furthermore, our solution requires no human localization, segmentation, or framewise tracking. This is achieved by treating the problem holistically as that of extracting the internal dynamics of video cuboids by modeling them in their natural form as multilinear tensors. To extract their internal dynamics, we devised a novel Two-Phase Decomposition (TP-Decomp) of a tensor that generates very compact and discriminative representations that are robust to even heavily perturbed data. Technically, a Rank-based Tensor Core Pyramid (Rank-TCP) descriptor is generated by combining multiple tensor cores under multiple ranks, allowing to represent video cuboids in a hierarchical tensor pyramid. The problem then reduces to a template matching problem, which is solved efficiently by using two boosting strategies: (1) to reduce search space, we filter the dense trajectory cloud extracted from the target video, (2) to boost the matching speed, we perform matching in an iterative coarse-to-fine manner. Experiments on 5 benchmarks show that our method outperforms current state-of-the-art under various challenging conditions. We also created a challenging dataset called Heavily Perturbed Video Array (HPVA) to validate the robustness of our framework under heavily perturbed situations.

Keywords

feature extraction; image matching; image representation; iterative methods; tensors; HPVA; TP-Decomp; compact representations; dense trajectory cloud; discriminative representations; feature-independent action spotting; heavily perturbed video array; hierarchical tensor pyramid; internal dynamics; iterative coarse-to-fine manner; matching speed; multilinear tensors; multiple tensor cores; rank-TCP descriptor; rank-based tensor core pyramid descriptor; search space; template matching problem; two-phase decomposition; unsupervised framework; video cuboids; Boosting; Feature extraction; Noise; Robustness; Tensile stress; Vectors; Videos;

fLanguage

English

Publisher

ieee

Conference_Titel

Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on

Conference_Location

Columbus, OH

Type

conf

DOI

10.1109/CVPR.2014.344

Filename

6909740