Title :
Action Recognition Using Spatial-Temporal Context
Author :
Hu, Qiong ; Qin, Lei ; Huang, Qingming ; Jiang, Shuqiang ; Tian, Qi
Author_Institution :
Key Lab. of Intelli. Info. Process., CAS, Beijing, China
Abstract :
The spatial-temporal local features and the bag of words representation have been widely used in the action recognition field. However, this framework usually neglects the internal spatial-temporal relations between video-words, resulting in ambiguity in action recognition task, especially for videos “in the wild”. In this paper, we solve this problem by utilizing the volumetric context around a video-word. Here, a local histogram of video-words distribution is calculated, which is referred as the “context” and further clustered into contextual words. To effectively use the contextual information, the descriptive video-phrases (ST-DVPs) and the descriptive video-cliques (ST-DVCs) are proposed. A general framework for ST-DVP and ST-DVC generation is described, and then action recognition can be done based on all these representations and their combinations. The proposed method is evaluated on two challenging human action datasets: the KTH dataset and the YouTube dataset. Experiment results confirm the validity of our approach.
Keywords :
gesture recognition; image representation; video signal processing; KTH dataset; ST-DVC; ST-DVP; YouTube dataset; action recognition; bag of words representation; contextual information; descriptive video-cliques; descriptive video-phrases; human action datasets; image representations; spatial-temporal context; spatial-temporal local features; video-words distribution; Accuracy; Context; Feature extraction; Histograms; Humans; Videos; YouTube; action recognition; descriptive video-cliques; descriptive video-phrases; spatial-temporal context; spatial-temporal local features;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.376