Encoding spatio-temporal distribution by generalized VLAD for action recognition

Author

Biyun Sheng ; Yan Yan ; Changyin Sun

Author_Institution

Sch. of Autom., Southeast Univ., Nanjing, China

fYear

2015

fDate

3-6 May 2015

Firstpage

620

Lastpage

625

Abstract

The location information of interest points is an important cue for action recognition. In order to model the spatio-temporal distribution, we propose a novel position feature which is constructed by normalized pairwise relative positions of points. Promising performance has been achieved by Vector of Locally Aggregated Descriptors (VLAD) which gather the differences between descriptors and visual words. However, original VLAD imposes equal weights for difference vectors and ignores zero-order statistics of local descriptors. In this paper, we present Generalized VLAD (GVLAD), an extension of VLAD to encode the position features as well as local appearance descriptors, by which different weights and zero-order information are simultaneously taken into consideration. The state-of-the-art performance on two benchmark datasets validates the effectiveness of our proposed method.

Keywords

image recognition; spatiotemporal phenomena; video coding; GVLAD; action recognition cue; benchmark datasets; difference vectors; generalized VLAD; interest point location information; local appearance descriptors; normalized pairwise relative point position; position feature; position feature encoding; spatio-temporal distribution encoding; spatio-temporal distribution modelling; vector-of-locally aggregated descriptors; visual words; zero-order information; Accuracy; Cameras; Computational modeling; Dictionaries; Encoding; Three-dimensional displays; Visualization;

fLanguage

English

Publisher

ieee

Conference_Titel

Electrical and Computer Engineering (CCECE), 2015 IEEE 28th Canadian Conference on

Conference_Location

Halifax, NS

ISSN

0840-7789

Print_ISBN

978-1-4799-5827-6

Type

conf

DOI

10.1109/CCECE.2015.7129346

Filename

7129346