• DocumentCode
    2776709
  • Title

    Improving kernel locality-sensitive hashing using pre-images and bounds

  • Author

    Bodò, Zalàn ; Csatò, Lehel

  • Author_Institution
    Dept. of Comput. Sci., Babes-Bolyai Univ., Cluj-Napoca, Romania
  • fYear
    2012
  • fDate
    10-15 June 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    Large databases become more and more common in (supervised) learning scenarios, containing hundred thousands or even millions of training examples. Finding the k-nearest neighbors (k-NN) of a point from a dataset, however, requires to compare the point to every training example. Locality-sensitive hashing (LSH) [11], [7], [3] hashes the dataset into buckets such that, with high probability, similar examples are grouped together, thus providing a sub-linear search time for neighbors. However, linear k-NN is sometimes not enough; the Euclidean distance does not always capture important data properties, therefore kernels are used to map data into a - possibly higher dimensional - feature space and perform the k-NN search there. To kernelize the LSH from [3], the most important question to be answered is how to generate random normally distributed vectors in the feature space. In this paper we present an improved kernel LSH technique, a modified version of the kLSH algorithm proposed in [12]. We compute the pre-images of the random feature space vectors to save important computational resources. Our proposal of pre-image calculation is interesting, because no additional intrinsic computations are required. Furthermore, for positive definite kernel functions we propose two inequalities to speed up searching.
  • Keywords
    image classification; learning (artificial intelligence); probability; search problems; computational resource; dataset; distributed vector; improved kernel LSH technique; k-nearest neighbor; kLSH algorithm; kernel function; kernel locality-sensitive hashing; large database; linear k-NN; preimage calculation; probability; random feature space vector; sublinear search time; supervised learning; Clustering algorithms; Indexes; Kernel; Testing; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2012 International Joint Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4673-1488-6
  • Electronic_ISBN
    2161-4393
  • Type

    conf

  • DOI
    10.1109/IJCNN.2012.6252742
  • Filename
    6252742