DocumentCode :
3164301
Title :
Min-Max Hash for Jaccard Similarity
Author :
Jianqiu Ji ; Jianmin Li ; Shuicheng Yan ; Qi Tian ; Bo Zhang
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
301
Lastpage :
309
Abstract :
Min-wise hash is a widely-used hashing method for scalable similarity search in terms of Jaccard similarity, while in practice it is necessary to compute many such hash functions for certain precision, leading to expensive computational cost. In this paper, we introduce an effective method, i.e. the min-max hash method, which significantly reduces the hashing time by half, yet it has a provably slightly smaller variance in estimating pair wise Jaccard similarity. In addition, the estimator of min-max hash only contains pair wise equality checking, thus it is especially suitable for approximate nearest neighbor search. Since min-max hash is equally simple as min-wise hash, many extensions based on min-wise hash can be easily adapted to min-max hash, and we show how to combine it with b-bit minwise hash. Experiments show that with the same length of hash code, min-max hash reduces the hashing time to half as much as that of min-wise hash, while achieving smaller mean squared error (MSE) in estimating pair wise Jaccard similarity, and better best approximate ratio (BAR) in approximate nearest neighbor search.
Keywords :
mean square error methods; minimax techniques; pattern clustering; search problems; approximate nearest neighbor search; b-bit minwise hash; best approximate ratio; hash code; hashing method; hashing time reduction; mean squared error; min-max hash estimator; min-max hash method; min-wise hash; pairwise Jaccard similarity estimation; pairwise equality checking; scalable similarity search; Approximation algorithms; Approximation methods; Computational efficiency; Computer science; Educational institutions; Laboratories; Nearest neighbor searches; Jaccard similarity; approximate nearest neighbor search; min-max hash; min-wise hash;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
ISSN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2013.119
Filename :
6729514
Link To Document :
بازگشت