DocumentCode :
3468954
Title :
Spatio-temporal Context Modeling for BoW-Based Video Classification
Author :
Saehoon Yi ; Pavlovic, Vladimir
Author_Institution :
Rutgers, State Univ. of New Jersey, Piscataway, NJ, USA
fYear :
2013
fDate :
2-8 Dec. 2013
Firstpage :
779
Lastpage :
786
Abstract :
We propose an autocorrelation Cox process that extends the traditional bag-of-words representation to model the spatio-temporal context within a video sequence. Bag-of-words models are effective tools for representing a video by a histogram of visual words that describe local appearance and motion. A major limitation of this model is its inability to encode the spatio-temporal structure of visual words pertaining to the context of the video. Several works have proposed to remedy this by learning the pair wise correlations between words. However, pair wise analysis leads to a quadratic increase in the number of features, making the models prone to over fitting and challenging to learn from data. The proposed autocorrelation Cox process model encodes, in a compact way, the contextual information within a video sequence, leading to improved classification performance. Spatio-temporal autocorrelations of visual words estimated from the Cox process are coupled with the information gain feature selection to discern the essential structure for the classification task. Experiments on crowd activity and human action dataset illustrate that the proposed model achieves state-of-the-art performance while providing intuitive spatio-temporal descriptors of the video context.
Keywords :
correlation methods; feature selection; image classification; image representation; image sequences; video signal processing; BoW-based video classification; autocorrelation Cox process model; bag-of-words models; bag-of-words representation; classification performance; classification task; crowd activity; histogram of visual words; human action dataset; information gain feature selection; local appearance; pairwise analysis; pairwise correlations; spatio-temporal autocorrelations; spatio-temporal context modeling; spatio-temporal descriptors; video representation; video sequence; Context; Context modeling; Correlation; Histograms; Kernel; Trajectory; Visualization; AutoCox; Bag of words; human action video classification; spatio-temporal context;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision Workshops (ICCVW), 2013 IEEE International Conference on
Conference_Location :
Sydney, NSW
Type :
conf
DOI :
10.1109/ICCVW.2013.107
Filename :
6755976
Link To Document :
بازگشت