DocumentCode
3672063
Title
What do 15,000 object categories tell us about classifying and localizing actions?
Author
Mihir Jain;Jan C. van Gemert;Cees G. M. Snoek
Author_Institution
University of Amsterdam, The Netherlands
fYear
2015
fDate
6/1/2015 12:00:00 AM
Firstpage
46
Lastpage
55
Abstract
This paper contributes to automatic classification and localization of human actions in video. Whereas motion is the key ingredient in modern approaches, we assess the benefits of having objects in the video representation. Rather than considering a handful of carefully selected and localized objects, we conduct an empirical study on the benefit of encoding 15,000 object categories for action using 6 datasets totaling more than 200 hours of video and covering 180 action classes. Our key contributions are i) the first in-depth study of encoding objects for actions, ii) we show that objects matter for actions, and are often semantically relevant as well. iii) We establish that actions have object preferences. Rather than using all objects, selection is advantageous for action recognition. iv)We reveal that object-action relations are generic, which allows to transferring these relationships from the one domain to the other. And, v) objects, when combined with motion, improve the state-of-the-art for both action classification and localization.
Keywords
"Encoding","Games","Accuracy","Cameras","Training","Visualization","Neural networks"
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on
Electronic_ISBN
1063-6919
Type
conf
DOI
10.1109/CVPR.2015.7298599
Filename
7298599
Link To Document