DocumentCode :
2078179
Title :
An Efficient Similarity Searching Scheme in Massive Databases
Author :
Shen, Haiying ; Li, Ting ; Schweiger, Tom
Author_Institution :
Dept. of Comput. Sci. & Comput. Eng., Univ. of Arkansas, Fayetteville, AR
fYear :
2008
fDate :
June 29 2008-July 5 2008
Firstpage :
47
Lastpage :
52
Abstract :
Locality sensitive hashing (LSH) is a method of performing probabilistic dimension reduction of high dimensional data. It is a popular technique for approximate nearest neighbor search. However, LSH needs large memory space and long processing time to achieve good performance when searching a massive dataset. In addition, it is not effective on locating similar data in a very high dimensional dataset. This paper proposes a new LSH-based similarity searching scheme, namely SMLSH. It intelligently combines a consistent hash function and min-wise independent permutations into LSH. SMLSH effectively classifies information according to the similarity with reduced memory space requirement and in a very efficient manner. It can quickly locate similar data in a massive dataset. Experiment results show that SMLSH is both time and space efficient in comparison with LSH. It yields significant improvements on the effectiveness of similar searching over LSH in a massive dataset.
Keywords :
data reduction; minimisation; probability; search problems; very large databases; LSH-based similarity searching; approximate nearest neighbor search; consistent hash function; efficient similarity searching; high dimensional data; locality sensitive hashing; massive databases; min-wise independent permutation; probabilistic dimension reduction; Computer science; Costs; Data engineering; Data structures; Databases; Delay; High performance computing; Nearest neighbor searches; Telecommunication computing; Tree data structures; Locality Sensitive Hashing (LSH); Min-Wise Independent Permutations; Similarity Search;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Digital Telecommunications, 2008. ICDT '08. The Third International Conference on
Conference_Location :
Bucharest
Print_ISBN :
978-0-7695-3188-5
Electronic_ISBN :
978-0-7695-3188-5
Type :
conf
DOI :
10.1109/ICDT.2008.12
Filename :
4561284
Link To Document :
بازگشت