Abstract :
Recently, randomized partition trees have been theoretically shown to be very effective in performing high dimensional nearest neighbor search. In this paper, we introduce a variant of randomized partition trees for high dimensional nearest neighbor search problem and provide theoretical justification for its choice. Experiments on various real-life datasets show that performance of this new variant is superior to the previous variant as well as to the locality sensitive hashing (LSH) method for nearest neighbor search. In addition, we establish the connection between various notions of difficulty in nearest neighbor search problem, that have recently been introduced, namely, potential function and relative contrast.
Keywords :
file organisation; pattern classification; query formulation; LSH method; high dimensional nearest neighbor search problem; locality sensitive hashing; randomized partition trees; Accuracy; Covariance matrices; Data structures; Nearest neighbor searches; Principal component analysis; Standards; Vectors; RP Trees; nearest neighbor search;