DocumentCode
3405160
Title
Learning mid-level features for recognition
Author
Boureau, Y-Lan ; Bach, Francis ; LeCun, Yann ; Ponce, Jean
fYear
2010
fDate
13-18 June 2010
Firstpage
2559
Lastpage
2566
Abstract
Many successful models for scene or object recognition transform low-level descriptors (such as Gabor filter responses, or SIFT descriptors) into richer representations of intermediate complexity. This process can often be broken down into two steps: (1) a coding step, which performs a pointwise transformation of the descriptors into a representation better adapted to the task, and (2) a pooling step, which summarizes the coded features over larger neighborhoods. Several combinations of coding and pooling schemes have been proposed in the literature. The goal of this paper is threefold. We seek to establish the relative importance of each step of mid-level feature extraction through a comprehensive cross evaluation of several types of coding modules (hard and soft vector quantization, sparse coding) and pooling schemes (by taking the average, or the maximum), which obtains state-of-the-art performance or better on several recognition benchmarks. We show how to improve the best performing coding scheme by learning a supervised discriminative dictionary for sparse coding. We provide theoretical and empirical insight into the remarkable performance of max pooling. By teasing apart components shared by modern mid-level feature extractors, our approach aims to facilitate the design of better recognition architectures.
Keywords
Gabor filters; feature extraction; learning (artificial intelligence); object recognition; vector quantisation; Gabor filter responses; SIFT descriptors; coding step; feature extraction; low level descriptors; mid level features learning; object recognition; pointwise transformation; pooling step; sparse coding; vector quantization; Convolutional codes; Dictionaries; Feature extraction; Gabor filters; Image classification; Image coding; Image representation; Layout; Object recognition; Vector quantization;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on
Conference_Location
San Francisco, CA
ISSN
1063-6919
Print_ISBN
978-1-4244-6984-0
Type
conf
DOI
10.1109/CVPR.2010.5539963
Filename
5539963
Link To Document