مرکز منطقه ای اطلاع رساني علوم و فناوري - Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem

DocumentCode :

1534076

Title :

Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem

Author :

Dubiner, Moshe

Author_Institution :

Google, Inc., Mountain View, CA, USA

Volume :

Issue :

fYear :

2010

Firstpage :

4166

Lastpage :

4179

Abstract :

The problem of finding high-dimensional approximate nearest neighbors is considered when the data is generated by some known probabilistic model. A large natural class of algorithms (bucketing codes) is investigated, Bucketing information is defined, and is proven to bound the performance of all bucketing codes. The bucketing information bound is asymptotically attained by some randomly constructed bucketing codes. The example of n Bernoulli(1/2) very long (length d → ∞) sequences of bits is singled out. It is assumed that n - 2m sequences are completely independent, while the remaining 2m sequences are composed of m dependent pairs. The interdependence within each pair is that their bits agree with probability 1/2 <; p ≤ 1. It is well known how to find most pairs with high probability by performing order of n^log₂2/p comparisons. It is shown that order of n^1/p+∈comparisons suffice, for any ∈ > 0. A specific 2-D inequality (proven in another paper) implies that the exponent 1/p cannot be lowered. Moreover, if one sequence out of each pair belongs to a known set of n^(2p-1)² sequences, pairing can be done using order n^1+∈ comparisons!

Keywords :

encoding; learning (artificial intelligence); pattern classification; probability; sequences; 2D inequality; 2m sequences; bucketing coding; bucketing information; dependent pairs; high-dimensional approximate nearest neighbors; information theory; n - 2m sequences; n Bernoulli (1/2) very long bit sequences; probabilistic model; statistical high-dimensional nearest-neighbor problem; Hamming distance; Information theory; Mutual information; Nearest neighbor searches; Pattern recognition; Probability; Statistical learning; Symmetric matrices; Approximate nearest neighbor; information theory;

fLanguage :

English

Journal_Title :

Information Theory, IEEE Transactions on

Publisher :

ieee

ISSN :

0018-9448

Type :

jour

DOI :

10.1109/TIT.2010.2050814

Filename :

5508609

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1534076