• DocumentCode
    2887411
  • Title

    Large Scale KNN-Graph Approximation

  • Author

    Trad, M.R. ; Joly, Antonin ; Boujemaa, N.

  • Author_Institution
    INRIA Paris-Rocquencourt, Le Chesnay, France
  • fYear
    2012
  • fDate
    10-10 Dec. 2012
  • Firstpage
    439
  • Lastpage
    448
  • Abstract
    Efficiently constructing the K-Nearest Neighbor Graph (K-NNG) of large and high dimensional datasets is crucial for many applications with feature-rich objects, such as images or other multimedia content. In this paper we investigate the use of high dimensional hashing methods for efficiently approximating the K-NNG in distributed environments. We first discuss the importance of balancing issues on the performance of such approaches and show why the baseline approach using Locality Sensitive Hashing (LSH) does not perform well. Our new KNN-join method is based on RMMH, a recently introduced hash function family based on randomly trained classifiers. We show that the resulting hash tables are much more balanced and that the number of resulting collisions can be greatly reduced without degrading quality. We further improve the load balancing of our distributed approach by designing a parallelized local join algorithm. We show that our method outperforms state-of-the-art in centralized settings and that it is efficiently scalable given its inherently distributed design. We finally present a distributed implementation of our method using a MapReduce framework and evaluate its performance on a large dataset.
  • Keywords
    approximation theory; data structures; distributed processing; graph theory; pattern classification; search problems; software performance evaluation; K-NNG; KNN-join method; MapReduce framework; balancing issues; distributed design; distributed environments; hash function; hash tables; high dimensional datasets; high dimensional hashing methods; k-nearest neighbor graph; large scale KNN-graph approximation; multimedia content; nearest neighbors search problem; parallelized local join algorithm; Approximation algorithms; Approximation methods; Feature extraction; Load management; Multimedia communication; Training; Vectors; Approximate; Distributed; Hashing; KNN-Graph; MapReduce; Scalable; Similarity Search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • Print_ISBN
    978-1-4673-5164-5
  • Type

    conf

  • DOI
    10.1109/ICDMW.2012.35
  • Filename
    6406473