• DocumentCode
    3322058
  • Title

    Nearest Neighbor Retrieval Using Distance-Based Hashing

  • Author

    Athitsos, Vassilis ; Potamias, Michalis ; Papapetrou, Panagiotis ; Kollios, George

  • Author_Institution
    Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    327
  • Lastpage
    336
  • Abstract
    A method is proposed for indexing spaces with arbitrary distance measures, so as to achieve efficient approximate nearest neighbor retrieval. Hashing methods, such as locality sensitive hashing (LSH), have been successfully applied for similarity indexing in vector spaces and string spaces under the Hamming distance. The key novelty of the hashing technique proposed here is that it can be applied to spaces with arbitrary distance measures, including non-metric distance measures. First, we describe a domain-independent method for constructing a family of binary hash functions. Then, we use these functions to construct multiple multibit hash tables. We show that the LSH formalism is not applicable for analyzing the behavior of these tables as index structures. We present a novel formulation, that uses statistical observations from sample data to analyze retrieval accuracy and efficiency for the proposed indexing method. Experiments on several real-world data sets demonstrate that our method produces good trade-offs between accuracy and efficiency, and significantly outperforms VP-trees, which are a well-known method for distance-based indexing.
  • Keywords
    file organisation; information retrieval; statistical analysis; Hamming distance; arbitrary distance measures; binary hash functions; distance-based hashing; domain-independent method; index structures; locality sensitive hashing; nearest neighbor retrieval; statistical methods; Computer science; Content based retrieval; Extraterrestrial measurements; Hamming distance; Indexing; Information retrieval; Multimedia databases; Nearest neighbor searches; Particle measurements; Statistical analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497441
  • Filename
    4497441