• DocumentCode
    178104
  • Title

    Modeling the Relationship of Action, Object, and Scene

  • Author

    Jing Liu ; Xinxiao Wu ; Yang Feng

  • Author_Institution
    Beijing Lab. of Intell. Inf. Technol., Beijing Inst. of Technol., Beijing, China
  • fYear
    2014
  • fDate
    24-28 Aug. 2014
  • Firstpage
    2005
  • Lastpage
    2010
  • Abstract
    In the task of action recognition, object and scene can provide rich source of contextual information for analyzing human actions, as human actions often occur under particular scene settings with certain related objects. Therefore, we try to utilize the contextual object and scene for improving the performance of action recognition. Specifically, a latent structural SVM is introduced to build the co-occurrence relationship among action, object and scene, in which the object class label and scene class label are treated as latent variables. Using this framework, we can simultaneously predict action class labels, object class labels as well as scene class labels. Moreover, we use a mid-level discriminative feature to separately describe the information of action, object and scene. The feature is actually a set of decision values from the pre-learned classifiers of each class, measuring the likelihood that the input video belongs to the corresponding class. In this paper, we use SVM as action and scene pre-learned classifiers, and use deformable part-based object detector as the object pre-learned classifier, so that object location can be obtained as a by-product. Experimental results on UCF Sports, YouTube and UCF50 datasets demonstrate the effectiveness of the proposed approach.
  • Keywords
    image classification; image motion analysis; image recognition; object detection; support vector machines; video signal processing; UCF Sports; UCF50 datasets; YouTube; action pre-learned classifiers; action recognition; deformable part-based object detector; latent structural SVM; object pre-learned classifier; scene pre-learned classifiers; Accuracy; Context; Context modeling; Correlation; Feature extraction; Training; YouTube; LSSVM; action recognition; context modeling; object detection; scene recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition (ICPR), 2014 22nd International Conference on
  • Conference_Location
    Stockholm
  • ISSN
    1051-4651
  • Type

    conf

  • DOI
    10.1109/ICPR.2014.350
  • Filename
    6977062