DocumentCode :
1443684
Title :
Semi-Supervised Hashing for Large-Scale Search
Author :
Wang, Jun ; Kumar, Sanjiv ; Chang, Shih-Fu
Author_Institution :
Bus. Analytics & Math. Sci. Dept., IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
34
Issue :
12
fYear :
2012
Firstpage :
2393
Lastpage :
2406
Abstract :
Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
Keywords :
content-based retrieval; file organisation; image retrieval; learning (artificial intelligence); ANN search; SSH framework; computational efficiency; content-based image retrieval; hashing-based approximate nearest neighbor search; information theoretic regularizer; large-scale search; locality sensitive hashing; memory efficiency; nonorthogonal hashing; orthogonal hashing; principal projections; random projections; semantic similarity; semisupervised hashing method; sequential hashing method; sequential learning paradigm; spectral hashing; unlabeled sets; Artificial neural networks; Binary codes; Encoding; Extraterrestrial measurements; Semantics; Semisupervised learning; Sequential analysis; Hashing; binary codes; nearest neighbor search; pairwise labels; semi-supervised hashing; sequential hashing;
fLanguage :
English
Journal_Title :
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Publisher :
ieee
ISSN :
0162-8828
Type :
jour
DOI :
10.1109/TPAMI.2012.48
Filename :
6148236
Link To Document :
بازگشت