• DocumentCode
    2953430
  • Title

    Unsupervised learning of event AND-OR grammar and semantics from video

  • Author

    Si, Zhangzhang ; Pei, Mingtao ; Yao, Benjamin ; Zhu, Song-Chun

  • Author_Institution
    Dept. of Stat., Univ. of California, Los Angeles, CA, USA
  • fYear
    2011
  • fDate
    6-13 Nov. 2011
  • Firstpage
    41
  • Lastpage
    48
  • Abstract
    We study the problem of automatically learning event AND-OR grammar from videos of a certain environment, e.g. an office where students conduct daily activities. We propose to learn the event grammar under the information projection and minimum description length principles in a coherent probabilistic framework, without manual supervision about what events happen and when they happen. Firstly a predefined set of unary and binary relations are detected for each video frame: e.g. agent´s position, pose and interaction with environment. Then their co-occurrences are clustered into a dictionary of simple and transient atomic actions. Recursively these actions are grouped into longer and complexer events, resulting in a stochastic event grammar. By modeling time constraints of successive events, the learned grammar becomes context-sensitive. We introduce a new dataset of surveillance-style video in office, and present a prototype system for video analysis integrating bottom-up detection, grammatical learning and parsing. On this dataset, the learning algorithm is able to automatically discover important events and construct a stochastic grammar, which can be used to accurately parse newly observed video. The learned grammar can be used as a prior to improve the noisy bottom-up detection of atomic actions. It can also be used to infer semantics of the scene. In general, the event grammar is an efficient way for common knowledge acquisition from video.
  • Keywords
    grammars; learning (artificial intelligence); video signal processing; bottom-up detection; coherent probabilistic framework; event AND-OR grammar; grammatical learning; knowledge acquisition; learning algorithm; parsing; stochastic event grammar; surveillance-style video; time constraints; unsupervised learning; video analysis; video frame; Color; Data models; Grammar; Production; Semantics; Stochastic processes; Transient analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Vision (ICCV), 2011 IEEE International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1550-5499
  • Print_ISBN
    978-1-4577-1101-5
  • Type

    conf

  • DOI
    10.1109/ICCV.2011.6126223
  • Filename
    6126223