DocumentCode
2857049
Title
Recursive Sparse, Spatiotemporal Coding
Author
Dean, Thomas ; Washington, Rich ; Corrado, Greg
Author_Institution
Google Inc., Mountain View, CA, USA
fYear
2009
fDate
14-16 Dec. 2009
Firstpage
645
Lastpage
650
Abstract
We present a new approach to learning sparse, spatiotemporal codes in which the number of basis vectors, their orientations, velocities and the size of their receptive fields change over the duration of unsupervised training. The algorithm starts with a relatively small, initial basis with minimal temporal extent. This initial basis is obtained through conventional sparse coding techniques and is expanded over time by recursively constructing a new basis consisting of basis vectors with larger temporal extent that proportionally conserve regions of previously trained weights. These proportionally conserved weights are combined with the result of adjusting newly added weights to represent a greater range of primitive motion features. The size of the current basis is determined probabilistically by sampling from existing basis vectors according to their activation on the training set. The resulting algorithm produces bases consisting of filters that are bandpass, spatially oriented and temporally diverse in terms of their transformations and velocities. The basic methodology borrows inspiration from the layer-by-layer learning of multiple-layer restricted Boltzmann machines developed by Geoff Hinton and his students. Indeed, we can learn multiple-layer sparse codes by training a stack of denoising autoencoders, but we have had greater success using L1 regularized regression in a variation on Olshausen and Field´s original SPARSENET. To accelerate learning and focus attention, we apply a space-time interest-point operator that selects for periodic motion. This attentional mechanism enables us to efficiently compute and compactly represent a broad range of interesting motion. We demonstrate the utility of our approach by using it to recognize human activity in video. Our algorithm meets or exceeds the performance of state-of-the-art activity-recognition methods.
Keywords
image motion analysis; image recognition; unsupervised learning; video coding; Boltzmann machines; SPARSENET; activity recognition; basis vectors; denoising autoencoders; primitive motion features; space-time interest-point operator; sparse coding; spatiotemporal codes; training set; unsupervised training; Acceleration; Band pass filters; Humans; Machine learning; Motion pictures; Noise reduction; Sampling methods; Spatiotemporal phenomena; Statistics; USA Councils; human-activity recognition; sparse coding;
fLanguage
English
Publisher
ieee
Conference_Titel
Multimedia, 2009. ISM '09. 11th IEEE International Symposium on
Conference_Location
San Diego, CA
Print_ISBN
978-1-4244-5231-6
Electronic_ISBN
978-0-7695-3890-7
Type
conf
DOI
10.1109/ISM.2009.28
Filename
5365771
Link To Document