Title :
Parsing Videos of Actions with Segmental Grammars
Author :
Pirsiavash, Hamed ; Ramanan, D.
Author_Institution :
Massachusetts Inst. of Technol., Cambridge, MA, USA
Abstract :
Real-world videos of human activities exhibit temporal structure at various scales, long videos are typically composed out of multiple action instances, where each instance is itself composed of sub-actions with variable durations and orderings. Temporal grammars can presumably model such hierarchical structure, but are computationally difficult to apply for long video streams. We describe simple grammars that capture hierarchical temporal structure while admitting inference with a finite-state-machine. This makes parsing linear time, constant storage, and naturally online. We train grammar parameters using a latent structural SVM, where latent subactions are learned automatically. We illustrate the effectiveness of our approach over common baselines on a new half-million frame dataset of continuous YouTube videos.
Keywords :
finite state machines; grammars; image segmentation; support vector machines; video streaming; actions video parsing; constant storage parsing; continuous YouTube videos; finite state machine; grammar parameter training; hierarchical structure; hierarchical temporal structure; human activities; latent structural SVM; latent subactions; linear time parsing; long video streams; multiple action instances; naturally online parsing; segmental grammars; temporal grammars; Data models; Grammar; Hidden Markov models; Markov processes; Presses; Videos;
Conference_Titel :
Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on
Conference_Location :
Columbus, OH
DOI :
10.1109/CVPR.2014.85