Title :
Frequency Based Locality Sensitive Hashing
Author :
Ling, Kang ; Wu, Gangshan
Author_Institution :
State Key Lab. for Novel Software Technol., Nanjing Univ., Nanjing, China
Abstract :
Nearest Neighbor (NN) search is of major importance to many applications, such as information retrieval, data mining and so on. However, finding the NN in high dimensional space has been proved to be time-consuming. In recent years, Locality Sensitive Hashing (LSH) has been proposed to solve Approximate Nearest Neighbor (ANN) problem. The main drawback of LSH is that it requires quite a lot of memory to achieve good performance, which makes it not that suit for today´s application of massive data. We analyze generic LSH scheme as well as the properties of LSH hash functions based on p-stable distributions and propose a new LSH scheme called Frequency Based Locality Sensitive Hashing (FBLSH). FBLSH just uses one function based on p-stable distributions as hash function of a hash table, and it sets a frequency threshold m, only those points which collide with query point more than m times can be candidate ANNs. FBLSH is easy to implement and through experiments, we show that FBLSH can reduce the extra space cost by several orders of magnitude with less (or similar) time cost while achieving better search quality compared with LSH based onp-stable distributions.
Keywords :
data mining; file organisation; information retrieval; statistical distributions; ANN problem; FBLSH; LSH hash functions; NN search; approximate nearest neighbor problem; data mining; frequency based locality sensitive hashing; frequency threshold; generic LSH scheme; hash table; high dimensional space; information retrieval; nearest neighbor search; p-stable distributions; query point; search quality; Algorithm design and analysis; Artificial neural networks; Indexes; Memory management; Nearest neighbor searches; Random access memory; Time frequency analysis; Information Retrieval; LSH; Similarity Search;
Conference_Titel :
Multimedia Technology (ICMT), 2011 International Conference on
Conference_Location :
Hangzhou
Print_ISBN :
978-1-61284-771-9
DOI :
10.1109/ICMT.2011.6002015