مرکز منطقه ای اطلاع رساني علوم و فناوري - A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm

DocumentCode :

1544186

Title :

A Heterogeneous High-Dimensional Approximate Nearest Neighbor Algorithm

Author :

Dubiner, Moshe

Author_Institution :

Google, Cupertino, CA, USA

Volume :

Issue :

fYear :

2012

Firstpage :

6646

Lastpage :

6658

Abstract :

We consider the problem of finding high-dimensional approximate nearest neighbors. We introduce an old style probabilistic formulation instead of the more general locality sensitive hashing (LSH) formulation, and show that at least for sparse problems it recognizes much more efficient algorithms than the sparseness destroying LSH random projections. Efficient algorithms for homogeneous (all coordinates have the same probability distribution) problems are well known, the most famous reference being the work by Broder in 1998. The main theme of this paper is to find its “best” generalization to heterogeneous (different coordinate probabilities) problems. We find a practical algorithm which is asymptotically best in a wide natural class of algorithms. Readers interested in the more complicated very best (at least up to date) can look up our previous work in 2010. The analysis of our algorithms reveals that its complexity is governed by an information like function, which we call “small leaves bucketing forest information.” Any doubts whether it is “information” are dispelled by the aforementioned work.

Keywords :

approximation theory; computational complexity; file organisation; pattern clustering; probability; LSH; approximate nearest neighbor algorithm; complexity; heterogeneous problem; information like function; locality sensitive hashing; probability; sparse problem; Algorithm design and analysis; Approximation algorithms; Dictionaries; Information theory; Probabilistic logic; Vectors; Vegetation; Clustering algorithms; nearest neighbor searches;

fLanguage :

English

Journal_Title :

Information Theory, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9448

Type :

jour

DOI :

10.1109/TIT.2012.2204169

Filename :

6220882

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1544186