• DocumentCode
    3748946
  • Title

    Action Recognition by Hierarchical Mid-Level Action Elements

  • Author

    Tian Lan;Yuke Zhu;Amir Roshan Zamir;Silvio Savarese

  • Author_Institution
    Stanford Univ., Stanford, CA, USA
  • fYear
    2015
  • Firstpage
    4552
  • Lastpage
    4560
  • Abstract
    Realistic videos of human actions exhibit rich spatiotemporal structures at multiple levels of granularity: an action can always be decomposed into multiple finer-grained elements in both space and time. To capture this intuition, we propose to represent videos by a hierarchy of mid-level action elements (MAEs), where each MAE corresponds to an action-related spatiotemporal segment in the video. We introduce an unsupervised method to generate this representation from videos. Our method is capable of distinguishing action-related segments from background segments and representing actions at multiple spatiotemporal resolutions. Given a set of spatiotemporal segments generated from the training data, we introduce a discriminative clustering algorithm that automatically discovers MAEs at multiple levels of granularity. We develop structured models that capture a rich set of spatial, temporal and hierarchical relations among the segments, where the action label and multiple levels of MAE labels are jointly inferred. The proposed model achieves state-of-the-art performance in multiple action recognition benchmarks. Moreover, we demonstrate the effectiveness of our model in real-world applications such as action recognition in large-scale untrimmed videos and action parsing.
  • Keywords
    "Spatiotemporal phenomena","Videos","Proposals","Training","Semantics","Manganese","Distance measurement"
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2015 IEEE International Conference on
  • Electronic_ISBN
    2380-7504
  • Type

    conf

  • DOI
    10.1109/ICCV.2015.517
  • Filename
    7410874