Title :
PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces
Author :
Ciaccia, Paolo ; Patella, Marco
Author_Institution :
Dipt. di Elettronica Inf. e Sistemistica, Bologna Univ., Italy
Abstract :
In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result
Keywords :
data mining; database indexing; probability; query processing; visual databases; PAC nearest neighbor queries; approximately correct algorithms; controlled search; error bound; high-dimensional spaces; index structures; metric spaces; probability; query object; sequential algorithm; Costs; Data mining; Error correction; Extraterrestrial measurements; Extraterrestrial phenomena; Identity-based encryption; Nearest neighbor searches; Neural networks; Partitioning algorithms; Pattern recognition;
Conference_Titel :
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location :
San Diego, CA
Print_ISBN :
0-7695-0506-6
DOI :
10.1109/ICDE.2000.839417