Title :
Vectors of locally aggregated centers for compact video representation
Author :
Abbas, Alhabib ; Deligiannis, Nikos ; Andreopoulos, Yiannis
Author_Institution :
Dept. of Electron. & Electr. Eng., Univ. Coll. London (UCL), London, UK
fDate :
June 29 2015-July 3 2015
Abstract :
We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.
Keywords :
image matching; image representation; pattern clustering; transforms; video signal processing; CLFC; LFC; Open Video Project; SIFT feature clustering; SIFT vectors; VLAC; VLAD; centers of local feature centers; compact video representation; feature representation; hyper-pooling method; mAP; mean average precision; scale-invariant feature transform; vector aggregation technique; vector of locally aggregated descriptors; vectors of locally aggregated centers; video segment similarity detection; Compaction; Distortion; Feature extraction; Principal component analysis; Robustness; Training; Visualization; scale-invariant feature transform; vector of locally aggregated descriptors; video similarity;
Conference_Titel :
Multimedia and Expo (ICME), 2015 IEEE International Conference on
Conference_Location :
Turin
DOI :
10.1109/ICME.2015.7177501