Title :
A GPGPU Algorithm for c-Approximate r-Nearest Neighbor Search in High Dimensions
Author :
Carraher, Lee A. ; Wilsey, Philip A. ; Annexstein, Fred S.
Author_Institution :
Sch. of Electron. & Comput. Syst., Univ. of Cincinnati, Cincinnati, OH, USA
Abstract :
Nearest Neighbor search is one of the simplest and most intuitive ideas in data mining. Due to it´s simplicity and diverse utility, Nearest Neighbor search is often found to be the workhorse of a variety of data mining, machine learning, and computer vision algorithms. For very high dimensional data, the naive linear search tends to be optimal. This is due to the so called curse of dimensionality in database search. To accelerate the search time, may researchers have turned to the power and efficiency of GPGPU computing; mainly for computing the data intensive distance metrics. Though promising speedups are achieved through this method, it fails to capitalize on many of the recent advances in algorithms for bounded error approximate nearest neighbor searches. Approximate nearest neighbor methods give satisfactory results, especially in very large and possibly redundant datasets, while boasting sub-linear search complexities. In this paper we present a c-approximate r-nearest neighbor search algorithm for CUDA using Locality Sensitive Hash with Nearest Neighbor search (LSH-NN). We implement this system in CUDA and test it on real world image SIFT vector data. Our tests show that we are able to achieve significant speedup over the serial version, with good approximations on scalability, while achieving the near-optimal search complexity of LSH-NN.
Keywords :
computational complexity; data mining; graphics processing units; learning (artificial intelligence); parallel architectures; search problems; CUDA; GPGPU algorithm; GPGPU computing; LSH-NN; approximate nearest neighbor methods; bounded error approximate nearest neighbor searches; c-approximate r-nearest neighbor search algorithm; computer vision algorithms; data intensive distance metrics; data mining; database search; diverse utility; high dimensional data; locality sensitive hash with nearest neighbor search; machine learning; naive linear search; near-optimal search complexity; real world image SIFT vector data; redundant datasets; serial version; significant speedup; sublinear search complexity; Complexity theory; Databases; Decoding; Instruction sets; Lattices; Nearest neighbor searches; Vectors; GPGPU; KNN; LSH; Nearest Neighbor;
Conference_Titel :
Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), 2013 IEEE 27th International
Conference_Location :
Cambridge, MA
Print_ISBN :
978-0-7695-4979-8
DOI :
10.1109/IPDPSW.2013.239