مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised learning of event AND-OR grammar and semantics from video

DocumentCode :

2953430

Title :

Unsupervised learning of event AND-OR grammar and semantics from video

Author :

Si, Zhangzhang ; Pei, Mingtao ; Yao, Benjamin ; Zhu, Song-Chun

Author_Institution :

Dept. of Stat., Univ. of California, Los Angeles, CA, USA

fYear :

2011

fDate :

6-13 Nov. 2011

Firstpage :

Lastpage :

Abstract :

We study the problem of automatically learning event AND-OR grammar from videos of a certain environment, e.g. an office where students conduct daily activities. We propose to learn the event grammar under the information projection and minimum description length principles in a coherent probabilistic framework, without manual supervision about what events happen and when they happen. Firstly a predefined set of unary and binary relations are detected for each video frame: e.g. agent´s position, pose and interaction with environment. Then their co-occurrences are clustered into a dictionary of simple and transient atomic actions. Recursively these actions are grouped into longer and complexer events, resulting in a stochastic event grammar. By modeling time constraints of successive events, the learned grammar becomes context-sensitive. We introduce a new dataset of surveillance-style video in office, and present a prototype system for video analysis integrating bottom-up detection, grammatical learning and parsing. On this dataset, the learning algorithm is able to automatically discover important events and construct a stochastic grammar, which can be used to accurately parse newly observed video. The learned grammar can be used as a prior to improve the noisy bottom-up detection of atomic actions. It can also be used to infer semantics of the scene. In general, the event grammar is an efficient way for common knowledge acquisition from video.

Keywords :

grammars; learning (artificial intelligence); video signal processing; bottom-up detection; coherent probabilistic framework; event AND-OR grammar; grammatical learning; knowledge acquisition; learning algorithm; parsing; stochastic event grammar; surveillance-style video; time constraints; unsupervised learning; video analysis; video frame; Color; Data models; Grammar; Production; Semantics; Stochastic processes; Transient analysis;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Computer Vision (ICCV), 2011 IEEE International Conference on

Conference_Location :

Barcelona

ISSN :

1550-5499

Print_ISBN :

978-1-4577-1101-5

Type :

conf

DOI :

10.1109/ICCV.2011.6126223

Filename :

6126223

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2953430