DocumentCode :
2984472
Title :
Efficient Learning for Hashing Proportional Data
Author :
Zhao Xu ; Kersting, Kristian ; Bauckhage, Christian
Author_Institution :
Schloss Birlinghoven, Fraunhofer IAIS, St. Augustin, Germany
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
735
Lastpage :
744
Abstract :
Spectral hashing (SH) seeks compact binary codes of data points so that Hamming distances between codes correlate with data similarity. Quickly learning such codes typically boils down to principle component analysis (PCA). However, this is only justified for normally distributed data. For proportional data (normalized histograms), this is not the case. Due to the sum-to-unity constraint, features that are as independent as possible will not all be uncorrelated. In this paper, we show that a linear-time transformation efficiently copes with sum-to-unity constraints: first, we select a small number K of diverse data points by maximizing the volume of the simplex spanned by these prototypes; second, we represent each data point by means of its cosine similarities to the K selected prototypes. This maximum volume hashing is sensible since each dimension in the transformed space is likely to follow a von Mises (vM) distribution, and, in very high dimensions, the vM distribution closely resembles a Gaussian distribution. This justifies to employ PCA on the transformed data. Our extensive experiments validate this: maximum volume hashing outperforms spectral hashing and other state of the art techniques.
Keywords :
Gaussian distribution; cryptography; learning (artificial intelligence); principal component analysis; Gaussian distribution; Hamming distance; PCA; binary code; data learning; data similarity; linear-time transformation; maximum volume hashing; normalized histogram; principle component analysis; proportional data hashing; simplex volume; spectral hashing; sum-to-unity constraint; von Mises distribution; Binary codes; Eigenvalues and eigenfunctions; Gaussian distribution; Optimization; Principal component analysis; Semantics; Vectors; Dimensionality Reduction; Proportional Data; Spectral Hashing; von Mises Distribution;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.142
Filename :
6413855
Link To Document :
بازگشت