• DocumentCode
    2955257
  • Title

    Learning spatiotemporal graphs of human activities

  • Author

    Brendel, William ; Todorovic, Sinisa

  • Author_Institution
    Oregon State Univ., Corvallis, OR, USA
  • fYear
    2011
  • fDate
    6-13 Nov. 2011
  • Firstpage
    778
  • Lastpage
    785
  • Abstract
    Complex human activities occurring in videos can be defined in terms of temporal configurations of primitive actions. Prior work typically hand-picks the primitives, their total number, and temporal relations (e.g., allow only followed-by), and then only estimates their relative significance for activity recognition. We advance prior work by learning what activity parts and their spatiotemporal relations should be captured to represent the activity, and how relevant they are for enabling efficient inference in realistic videos. We represent videos by spatiotemporal graphs, where nodes correspond to multiscale video segments, and edges capture their hierarchical, temporal, and spatial relationships. Access to video segments is provided by our new, multiscale segmenter. Given a set of training spatiotemporal graphs, we learn their archetype graph, and pdf´s associated with model nodes and edges. The model adaptively learns from data relevant video segments and their relations, addressing the “what” and “how.” Inference and learning are formulated within the same framework - that of a robust, least-squares optimization - which is invariant to arbitrary permutations of nodes in spatiotemporal graphs. The model is used for parsing new videos in terms of detecting and localizing relevant activity parts. We out-perform the state of the art on benchmark Olympic and UT human-interaction datasets, under a favorable complexity-vs.-accuracy trade-off.
  • Keywords
    graph theory; graphs; image recognition; video signal processing; activity recognition; archetype graph; complex human activities; multiscale segmenter; multiscale video segment; realistic videos; spatiotemporal graphs; spatiotemporal relations; Electron tubes; Hidden Markov models; Optimization; Spatiotemporal phenomena; Training; Vectors; Videos;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-5499
  • Print_ISBN
    978-1-4577-1101-5
  • Type

    conf

  • DOI
    10.1109/ICCV.2011.6126316
  • Filename
    6126316