Title :
Dominant spatio-temporal modulations and energy tracking in videos: Application to interest point detection for action recognition
Author :
Georgakis, Christos ; Maragos, Petros ; Evangelopoulos, Georgios ; Dimitriadis, Dimitrios
Author_Institution :
Sch. of E.C.E., Nat. Tech. Univ. of Athens, Athens, Greece
fDate :
Sept. 30 2012-Oct. 3 2012
Abstract :
The presence of multiband amplitude and frequency modulations (AM-FM) in wideband signals, such as textured images or speech, has led to the development of efficient multicomponent modulation models for low-level image and sound analysis. Moreover, compact yet descriptive representations have emerged by tracking, through non-linear energy operators, the dominant model components across time, space or frequency. In this paper, we propose a generalization of such approaches in the 3D spatio-temporal domain and explore the potential of incorporating the Dominant Component Analysis scheme for interest point detection and human action recognition in videos. Within this framework, actions are implicitly considered as manifestations of spatio-temporal oscillations in the dynamic visual stream. Multiband filtering and energy operators are applied to track the source energy in both spatial and temporal frequency bands. A new measure for extracting keypoint locations is formulated as the temporal dominant energy computed over the spatial dominant components, in terms of their modulation energy, of input video frames. Theoretical formulation is supported by evaluation and comparisons in human action classification, which demonstrate the potential of the proposed spatio-temporal detector.
Keywords :
amplitude modulation; feature extraction; filtering theory; frequency modulation; image representation; image texture; object detection; object recognition; object tracking; statistical analysis; video signal processing; 3D spatio-temporal domain; AM-FM; dominant component analysis scheme; dominant spatio-temporal modulations; dynamic visual stream; human action recognition; input video frames; interest point detection; keypoint location extraction; low-level image analysis; low-level sound analysis; multiband amplitude modulations; multiband filtering; multiband frequency modulations; multicomponent modulation models; nonlinear energy operators; source energy tracking; spatial frequency bands; spatio-temporal oscillations; temporal frequency bands; textured images; textured speech; video representations; wideband signals; Detectors; Feature extraction; Frequency modulation; Humans; Videos; Wideband; Human action recognition in videos; dominant component analysis; multiband filtering; multicomponent AM-FM models; spatio-temporal interest point detectors;
Conference_Titel :
Image Processing (ICIP), 2012 19th IEEE International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-4673-2534-9
Electronic_ISBN :
1522-4880
DOI :
10.1109/ICIP.2012.6466966