DocumentCode :
2292697
Title :
Selection and context for action recognition
Author :
Han, Dong ; Bo, Liefeng ; Sminchisescu, Cristian
Author_Institution :
Univ. of Bonn, Bonn, Germany
fYear :
2009
fDate :
Sept. 29 2009-Oct. 2 2009
Firstpage :
1933
Lastpage :
1940
Abstract :
Recognizing human action in non-instrumented video is a challenging task not only because of the variability produced by general scene factors like illumination, background, occlusion or intra-class variability, but also because of subtle behavioral patterns among interacting people or between people and objects in images. To improve recognition, a system may need to use not only low-level spatio-temporal video correlations but also relational descriptors between people and objects in the scene. In this paper we present contextual scene descriptors and Bayesian multiple kernel learning methods for recognizing human action in complex non-instrumented video. Our contribution is threefold: (1) we introduce bag-of-detector scene descriptors that encode presence/absence and structural relations between object parts; (2) we derive a novel Bayesian classification method based on Gaussian processes with multiple kernel covariance functions (MKGPC), in order to automatically select and weight multiple features, both low-level and high-level, out of a large collection, in a principled way, and (3) perform large scale evaluation using a variety of features on the KTH and a recently introduced, challenging, Hollywood movie dataset. On the KTH dataset, we obtain 94.1% accuracy, the best result reported to date. On the Hollywood dataset we obtain promising results in several action classes using fewer descriptors and about 9.1% improvement in a previous benchmark test.
Keywords :
Bayes methods; Gaussian processes; covariance analysis; image classification; image coding; image motion analysis; learning (artificial intelligence); video signal processing; Bayesian classification method; Bayesian multiple kernel learning method; Gaussian process; Hollywood movie dataset; KTH dataset; absence encoding; bag-of-detector scene descriptors; contextual scene descriptors; human action recognition; low-level spatio-temporal video correlations; multiple kernel covariance function; noninstrumented video; presence encoding; relational descriptors; structural relation encoding; Bayesian methods; Gaussian processes; Humans; Image recognition; Kernel; Layout; Learning systems; Lighting; Pattern recognition; Performance evaluation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision, 2009 IEEE 12th International Conference on
Conference_Location :
Kyoto
ISSN :
1550-5499
Print_ISBN :
978-1-4244-4420-5
Electronic_ISBN :
1550-5499
Type :
conf
DOI :
10.1109/ICCV.2009.5459427
Filename :
5459427
Link To Document :
بازگشت