Title :
Contextual video clip classification
Author :
Guler, Samet ; Morde, A. ; Pushee, I. ; Ma, Xiao-Li ; Silverstein, J. ; McAuliffe, S.
Author_Institution :
intuVision Inc., Woburn, MA, USA
Abstract :
Content based classification of unrestricted video clips from various sources plays an important role in video analysis and search. Thus far automated video understanding research focused on videos from sources such as aerial, broadcast, meeting room etc. For each of these video sources certain assumptions are made which constrain the problem of content analysis. None of these assumptions hold for analyzing the contents of unrestricted videos. We present a top down approach to content based video classification by first understanding the overall scene structure and then detecting the actors, actions and objects along with the context they interact in as well as the global motion information from the scene. A scene in a video clip is used as a semantic unit providing the visual context and the location characteristics such as indoor, outdoor and type of each associated with the scene. The location context is tied with the video shooting style of zooming in and out to create a scene description hierarchy. Actors are considered as detected people and faces, certain poses of people help define the action and activities, while objects relevant to certain types of events provide additional context. Summary features are created for the scene semantic units based on the actors, actions, object detections and the context. These features were successfully used to train an asymmetric Random Forest classifier for video event classification. The top down approach we present here has the inherent advantage of being able to describe the video in addition to providing content based classification. The approach was tested on the Multimedia Event Detection (MED) 2011 dataset with promising results.
Keywords :
feature extraction; image classification; image motion analysis; learning (artificial intelligence); object detection; video signal processing; MED 2011 dataset; asymmetric random forest classifier; automated video understanding; content analysis; content based classification; contextual video clip classification; global motion information; location characteristics; multimedia event detection; object detection; scene description hierarchy; summary feature; video analysis; video event classification; video search; video shooting style; visual context characteristics;
Conference_Titel :
Applied Imagery Pattern Recognition Workshop (AIPR), 2012 IEEE
Conference_Location :
Washington, DC
Print_ISBN :
978-1-4673-4558-3
DOI :
10.1109/AIPR.2012.6528196