مرکز منطقه ای اطلاع رساني علوم و فناوري - Spatio-temporal steerable pyramid for human action recognition

DocumentCode :

615093

Title :

Spatio-temporal steerable pyramid for human action recognition

Author :

Xiantong Zhen ; Ling Shao

Author_Institution :

Dept. of Electron. & Electr. Eng., Univ. of Sheffield, Sheffield, UK

fYear :

2013

fDate :

22-26 April 2013

Firstpage :

Lastpage :

Abstract :

In this paper, we propose a novel holistic representation based on the spatio-temporal steerable pyramid (STSP) for human action recognition. The spatio-temporal Laplacian pyramid provides an effective technique for multi-scale analysis of video sequences. By decomposing spatio-temporal volumes into band-passed sub-volumes, spatio-temporal patterns residing in different scales will be nicely localized. Then three-dimensional separable steerable filters are conducted on each of the sub-volume to capture the spatio-temporal orientation information efficiently. The outputs of the quadrature pair of steerable filters are squared and summed to yield a more robust measure of motion energy. To make the representation invariant to shifting and applicable with coarsely-extracted bounding boxes for the performed actions, max pooling operations are employed between responses of the filtering at adjacent scales, and over spatio-temporal local neighborhoods. Taking advantage of multi-scale and multi-orientation analysis and feature pooling, STSP produces a compact but informative and invariant representation of human actions. We conduct extensive experiments on the KTH, IXMAS and HMDB51 datasets, and the proposed STSP achieves comparable results with the state-of-the-art methods.

Keywords :

filtering theory; image motion analysis; image representation; image sequences; object recognition; video signal processing; HMDB51 dataset; IXMAS dataset; KTH dataset; STSP; band-passed subvolume; coarsely-extracted bounding box; holistic representation; human action recognition; max pooling operation; motion energy; multiorientation analysis; multiscale analysis; separable steerable filter; spatio-temporal Laplacian pyramid; spatio-temporal steerable pyramid; video sequence; Energy measurement; Feature extraction; Laplace equations; Principal component analysis; Robustness; Video sequences; Visualization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Face and Gesture Recognition (FG), 2013 10th IEEE International Conference and Workshops on

Conference_Location :

Shanghai

Print_ISBN :

978-1-4673-5545-2

Electronic_ISBN :

978-1-4673-5544-5

Type :

conf

DOI :

10.1109/FG.2013.6553732

Filename :

6553732

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=615093