• DocumentCode
    1961138
  • Title

    PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces

  • Author

    Ciaccia, Paolo ; Patella, Marco

  • Author_Institution
    Dipt. di Elettronica Inf. e Sistemistica, Bologna Univ., Italy
  • fYear
    2000
  • fDate
    2000
  • Firstpage
    244
  • Lastpage
    255
  • Abstract
    In high-dimensional and complex metric spaces, determining the nearest neighbor (NN) of a query object q can be a very expensive task, because of the poor partitioning operated by index structures-the so-called “curse of dimensionality”. This also affects approximately correct (AC) algorithms, which return as results a point whose distance from q is less than (1+ε) times the distance between q and its true NN. In this paper we introduce a new approach to approximate similarity search, called PAC-NN queries, where the error bound ε can be exceeded with probability δ and both ε and δ parameters can be tuned at query time to trade the quality of the result for the cost of the search. We describe sequential and index-based PAC-NN algorithms that exploit the distance distribution of the query object in order to determine a stopping condition that respects the error bound. Analysis and experimental evaluation of the sequential algorithm confirm that, for moderately large data sets and suitable ε and δ values, PAC-NN queries can be efficiently solved and the error controlled. Then, we provide experimental evidence that indexing can further speed-up the retrieval process by up to 1-2 orders of magnitude without giving up the accuracy of the result
  • Keywords
    data mining; database indexing; probability; query processing; visual databases; PAC nearest neighbor queries; approximately correct algorithms; controlled search; error bound; high-dimensional spaces; index structures; metric spaces; probability; query object; sequential algorithm; Costs; Data mining; Error correction; Extraterrestrial measurements; Extraterrestrial phenomena; Identity-based encryption; Nearest neighbor searches; Neural networks; Partitioning algorithms; Pattern recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2000. Proceedings. 16th International Conference on
  • Conference_Location
    San Diego, CA
  • ISSN
    1063-6382
  • Print_ISBN
    0-7695-0506-6
  • Type

    conf

  • DOI
    10.1109/ICDE.2000.839417
  • Filename
    839417