DocumentCode :
3707892
Title :
Cluster encoding for modelling temporal variation in video
Author :
Negar Rostamzadeh;Jasper Uijlings;Ionuj Mironică;Mojtaba Khomami Abadi;Bogdan Ionescu;Nicu Sebe
Author_Institution :
University of Trento, Italy
fYear :
2015
Firstpage :
3640
Lastpage :
3644
Abstract :
Classical Bag-of-Words methods represent videos by modeling the variation of local visual descriptors throughout the video. In this approach they mix variation in time and space indiscriminately while these dimensions are fundamentally different. Therefore, in this paper we present a novel method for video representation which explicitly captures temporal variation over time. We do this by first creating frame-based features using standard Bag-of-Words techniques. To model the variation in time over these frame-based features, we introduce Hard and Soft Cluster Encoding, novel techniques to model variation inspired by the Fisher Kernel [1] and VLAD [2]. Results on the Rochester ADL [3] and Blip10k [4] datasets show that our method yields improvements of respectively 6.6% and 7.4% over our baselines. On Blip10k we outperform the state-of-the-art by 3.6% when using only visual features.
Keywords :
"Encoding","Kernel","Standards","Hidden Markov models","Visualization","Histograms","Vocabulary"
Publisher :
ieee
Conference_Titel :
Image Processing (ICIP), 2015 IEEE International Conference on
Type :
conf
DOI :
10.1109/ICIP.2015.7351483
Filename :
7351483
Link To Document :
بازگشت