• DocumentCode
    3748950
  • Title

    Objects2action: Classifying and Localizing Actions without Any Video Example

  • Author

    Mihir Jain;Jan C. van Gemert;Thomas Mensink;Cees G. M. Snoek

  • fYear
    2015
  • Firstpage
    4588
  • Lastpage
    4596
  • Abstract
    The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.
  • Keywords
    "Semantics","Image recognition","Encoding","Neural networks","Training","Visualization","Computational modeling"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2015 IEEE International Conference on
  • Electronic_ISBN
    2380-7504
  • Type

    conf

  • DOI
    10.1109/ICCV.2015.521
  • Filename
    7410878