Author_Institution :
Sch. of Electron. Eng., Xidian Univ., Xi´an, China
Abstract :
It is essential to build good image representations for many computer vision tasks. In this study, the authors propose a hierarchical spatial pyramid max pooling method based on scale-invariant feature transform (SIFT) features and sparse coding, which builds image representations through a hierarchical network. It includes three parts: SIFT features´ extraction, sparse coding and spatial pyramid max pooling. To mimic visual cortex, spatial pyramid max pooling is, firstly, performed on the original SIFT features in the image patches, which distils the features and extracts the most distinctive and significant feature, the SIFT-pooled feature, in each local patch, instead of using the original SIFT features as usual. Then, a dictionary is trained using some random SIFT-pooled features and sparse coding is performed using the trained dictionary for all SIFT-pooled features through K-singular value decomposition algorithm. Finally, on the sparse codes of all image patches, spatial pyramid max pooling is carried again on the image level. The image representations will be built by concatenating the pooling features of each level. The authors use the algorithm and simple linear support vector machine (SVM) for image classification on three datasets: Caltech-101, Caltech-256 and 15-Scenes and the experimental results show that the authors algorithm can reach a competitive performance compared with recently published results.
Keywords :
computer vision; feature extraction; image classification; image coding; image representation; singular value decomposition; transforms; 15-Scenes dataset; Caltech-101 dataset; Caltech-256 dataset; K-singular value decomposition algorithm; SIFT-pooled feature extraction; computer vision tasks; hierarchical spatial pyramid max pooling method; image classification; image representations; linear SVM; linear support vector machine; scale-invariant feature transform feature extraction; sparse coding;