DocumentCode :
739698
Title :
Recognizing Actions Through Action-Specific Person Detection
Author :
Khan, Fahad Shahbaz ; Jiaolong Xu ; van de Weijer, Joost ; Bagdanov, Andrew D. ; Anwer, Rao Muhammad ; Lopez, Antonio M.
Author_Institution :
Dept. of Electr. EngineeringComputer Vision Lab., Linkoping Univ., Linkoping, Sweden
Volume :
24
Issue :
11
fYear :
2015
Firstpage :
4422
Lastpage :
4432
Abstract :
Action recognition in still images is a challenging problem in computer vision. To facilitate comparative evaluation independently of person detection, the standard evaluation protocol for action recognition uses an oracle person detector to obtain perfect bounding box information at both training and test time. The assumption is that, in practice, a general person detector will provide candidate bounding boxes for action recognition. In this paper, we argue that this paradigm is suboptimal and that action class labels should already be considered during the detection stage. Motivated by the observation that body pose is strongly conditioned on action class, we show that: 1) the existing state-of-the-art generic person detectors are not adequate for proposing candidate bounding boxes for action classification; 2) due to limited training examples, the direct training of action-specific person detectors is also inadequate; and 3) using only a small number of labeled action examples, the transfer learning is able to adapt an existing detector to propose higher quality bounding boxes for subsequent action classification. To the best of our knowledge, we are the first to investigate transfer learning for the task of action-specific person detection in still images. We perform extensive experiments on two benchmark data sets: 1) Stanford-40 and 2) PASCAL VOC 2012. For the action detection task (i.e., both person localization and classification of the action performed), our approach outperforms methods based on general person detection by 5.7% mean average precision (MAP) on Stanford-40 and 2.1% MAP on PASCAL VOC 2012. Our approach also significantly outperforms the state of the art with a MAP of 45.4% on Stanford-40 and 31.4% on PASCAL VOC 2012. We also evaluate our action detection approach for the task of action classification (i.e., recognizing actions without localizing them). For this task, our approach, without using any ground-truth person localization at test tim- , outperforms on both data sets state-of-the-art methods, which do use person locations.
Keywords :
computer vision; image recognition; protocols; PASCAL VOC 2012; Stanford-40; action class labels; action classification; action recognition; action-specific person detection; bounding boxes; computer vision; limited training examples; mean average precision; person classification; person localization; standard evaluation protocol; Adaptation models; Detectors; Feature extraction; Image recognition; Proposals; Standards; Training; Action Recognition; Action recognition; Deep features; Transfer Learning; deep features; transfer learning;
fLanguage :
English
Journal_Title :
Image Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1057-7149
Type :
jour
DOI :
10.1109/TIP.2015.2465147
Filename :
7180357
Link To Document :
بازگشت