Improving kernel locality-sensitive hashing using pre-images and bounds

Author

Bodò, Zalàn ; Csatò, Lehel

Author_Institution

Dept. of Comput. Sci., Babes-Bolyai Univ., Cluj-Napoca, Romania

fYear

2012

fDate

10-15 June 2012

Firstpage

1

Lastpage

8

Abstract

Large databases become more and more common in (supervised) learning scenarios, containing hundred thousands or even millions of training examples. Finding the k-nearest neighbors (k-NN) of a point from a dataset, however, requires to compare the point to every training example. Locality-sensitive hashing (LSH) [11], [7], [3] hashes the dataset into buckets such that, with high probability, similar examples are grouped together, thus providing a sub-linear search time for neighbors. However, linear k-NN is sometimes not enough; the Euclidean distance does not always capture important data properties, therefore kernels are used to map data into a - possibly higher dimensional - feature space and perform the k-NN search there. To kernelize the LSH from [3], the most important question to be answered is how to generate random normally distributed vectors in the feature space. In this paper we present an improved kernel LSH technique, a modified version of the kLSH algorithm proposed in [12]. We compute the pre-images of the random feature space vectors to save important computational resources. Our proposal of pre-image calculation is interesting, because no additional intrinsic computations are required. Furthermore, for positive definite kernel functions we propose two inequalities to speed up searching.

Keywords

image classification; learning (artificial intelligence); probability; search problems; computational resource; dataset; distributed vector; improved kernel LSH technique; k-nearest neighbor; kLSH algorithm; kernel function; kernel locality-sensitive hashing; large database; linear k-NN; preimage calculation; probability; random feature space vector; sublinear search time; supervised learning; Clustering algorithms; Indexes; Kernel; Testing; Training; Vectors;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2012 International Joint Conference on

Conference_Location

Brisbane, QLD

ISSN

2161-4393

Print_ISBN

978-1-4673-1488-6

Electronic_ISBN

2161-4393

Type

conf

DOI

10.1109/IJCNN.2012.6252742

Filename

6252742