Vectors of locally aggregated centers for compact video representation

Author

Abbas, Alhabib ; Deligiannis, Nikos ; Andreopoulos, Yiannis

Author_Institution

Dept. of Electron. & Electr. Eng., Univ. Coll. London (UCL), London, UK

fYear

2015

fDate

June 29 2015-July 3 2015

Firstpage

1

Lastpage

6

Abstract

We propose a novel vector aggregation technique for compact video representation, with application in accurate similarity detection within large video datasets. The current state-of-the-art in visual search is formed by the vector of locally aggregated descriptors (VLAD) of Jegou et al. VLAD generates compact video representations based on scale-invariant feature transform (SIFT) vectors (extracted per frame) and local feature centers computed over a training set. With the aim to increase robustness to visual distortions, we propose a new approach that operates at a coarser level in the feature representation. We create vectors of locally aggregated centers (VLAC) by first clustering SIFT features to obtain local feature centers (LFCs) and then encoding the latter with respect to given centers of local feature centers (CLFCs), extracted from a training set. The sum-of-differences between the LFCs and the CLFCs are aggregated to generate an extremely-compact video description used for accurate video segment similarity detection. Experimentation using a video dataset, comprising more than 1000 minutes of content from the Open Video Project, shows that VLAC obtains substantial gains in terms of mean Average Precision (mAP) against VLAD and the hyper-pooling method of Douze et al., under the same compaction factor and the same set of distortions.

Keywords

image matching; image representation; pattern clustering; transforms; video signal processing; CLFC; LFC; Open Video Project; SIFT feature clustering; SIFT vectors; VLAC; VLAD; centers of local feature centers; compact video representation; feature representation; hyper-pooling method; mAP; mean average precision; scale-invariant feature transform; vector aggregation technique; vector of locally aggregated descriptors; vectors of locally aggregated centers; video segment similarity detection; Compaction; Distortion; Feature extraction; Principal component analysis; Robustness; Training; Visualization; scale-invariant feature transform; vector of locally aggregated descriptors; video similarity;

fLanguage

English

Publisher

ieee

Conference_Titel

Multimedia and Expo (ICME), 2015 IEEE International Conference on

Conference_Location

Turin

Type

conf

DOI

10.1109/ICME.2015.7177501

Filename

7177501