Title :
Distributed online similarity search in high dimensional space
Author :
Baohui Li ; Kefu Xu ; Hongtao Xie
Author_Institution :
Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
Abstract :
In this paper, we consider distributed on-line similarity search for big data in high dimensional spaces, for which Locality Sensitive Hashing (LSH) was the method of choice. But LSH scheme needs a rather large number of hash tables and optimal parameters. So, it is difficult for LSH to deal with big data in one machine. To reduce the size of big data, we divide the dataset into well separated clusters with bounded aspect ratios, locating them in different peers in ring network, using random projection tree(RP-tree). To limit the number of network accesses, we put similar subgroups adjacent to each other. Then, we construct one LSH hash table for each subgroup using optimal parameters. It is shown by comprehensive performance evaluations using real world data that our approach decreases the network cost and brings major performance improvement, while maintaining a good load balance between different machines.
Keywords :
Big Data; data reduction; distributed processing; information retrieval; network theory (graphs); pattern matching; trees (mathematics); LSH hash table; LSH scheme; RP-tree; big data size reduction; distributed online similarity search; high dimensional spaces; locality sensitive hashing method; optimal parameters; performance improvement; random projection tree; ring network; Approximation algorithms; Approximation methods; Data handling; Data storage systems; Information management; Peer-to-peer computing; Servers; distributed; high dimensional; locality sensitive search; similarrity search;
Conference_Titel :
Big Data and Smart Computing (BIGCOMP), 2014 International Conference on
Conference_Location :
Bangkok
DOI :
10.1109/BIGCOMP.2014.6741437