Title :
Hashing with Generalized Nyström Approximation
Author :
Jeong-Min Yun ; Saehoon Kim ; Seungjin Choi
Author_Institution :
Dept. of Comput. Sci. & Eng., Pohang Univ. of Sci. & Technol., Pohang, South Korea
Abstract :
Hashing, which involves learning binary codes to embed high-dimensional data into a similarity-preserving low-dimensional Hamming space, is often formulated as linear dimensionality reduction followed by binary quantization. Linear dimensionality reduction, based on maximum variance formulation, requires leading eigenvectors of data covariance or graph Laplacian matrix. Computing leading singular vectors or eigenvectors in the case of high-dimension and large sample size, is a main bottleneck in most of data-driven hashing methods. In this paper we address the use of generalized Nystrom method where a subset of rows and columns are used to approximately compute leading singular vectors of the data matrix, in order to improve the scalability of hashing methods in the case of high-dimensional data with large sample size. Especially we validate the useful behavior of generalized Nystrom approximation with uniform sampling, in the case of a recently-developed hashing method based on principal component analysis (PCA) followed by an iterative quantization, referred to as PCA+ITQ, developed by Gong and Lazebnik. We compare the performance of generalized Nystrom approximation with uniform and non-uniform sampling, to the full singular value decomposition (SVD) method, confirming that the uniform sampling improves the computational and space complexities dramatically, while the performance is not much sacrificed. In addition we present low-rank approximation error bounds for generalized Nystrom approximation with uniform sampling, which is not a trivial extension of available results on the non-uniform sampling case.
Keywords :
Hamming codes; approximation theory; binary codes; computational complexity; eigenvalues and eigenfunctions; file organisation; graph theory; iterative methods; matrix algebra; principal component analysis; quantisation (signal); sampling methods; PCA-based hashing method; SVD method; binary quantization; computational complexities; data covariance; data matrix; data-driven hashing methods; eigenvectors; generalized Nystrom approximation; graph Laplacian matrix; high-dimensional data; iterative quantization; large sample size; leading singular vectors; learning binary codes; linear dimensionality reduction; low-rank approximation error bounds; maximum variance formulation; nonuniform sampling; principal component analysis-based hashing method; similarity-preserving low-dimensional Hamming space; singular value decomposition method; space complexities; uniform sampling; Approximation algorithms; Approximation error; Binary codes; Principal component analysis; Quantization; Vectors; CUR decomposition; generalized Nystrom approximation; hashing; pseudoskeleton approximation; uniform sampling;
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-4649-8
DOI :
10.1109/ICDM.2012.22