• DocumentCode
    2158594
  • Title

    High-Dimensional Similarity Searches Using A Metric Pseudo-Grid

  • Author

    Digout, Christian ; Nascimento, Mario A.

  • Author_Institution
    Univ. of Alberta, Canada
  • fYear
    2005
  • fDate
    05-08 April 2005
  • Firstpage
    1174
  • Lastpage
    1174
  • Abstract
    Despite the proposal of numerous tree-based access structures for high dimensional similarity searches, techniques based on a sequential scan have been shown to be simple yet quite efficient alternatives. Given that random accesses to disk are expensive, a linear scan of the (smaller) pre-processed dataset is often much more efficient than even a relatively small number of random disk accesses yielded by tree-based indices. In this paper we present a technique which uses a pseudo-partition of a general metric space analog to the VA-file’s partition of the vector space. The rationale is to use a number of pivot objects in the metric space, each one determining a number of hyper-rings in this space. The intersection of those rings, determine pseudo-cells analog to the VA-file cells in the vector space. In order to speedup query processing the data set is clustered (using any applicable clustering technique). Clusters not intersecting cells intersected by the query region cannot contribute to the answer set. Thus, only a few clusters are searched using an I/O efficient linear scan of the cluster’s data. The proposed technique, which we call the M-GRID, is, by construction, applicable to both general metric spaces and to traditional vector spaces as long as a metric distance function is used. The M-GRID is robust to several parameters and experiments with synthetic and real data sets show that it is able to perform nearest neighbor queries up to 10 times faster than the VA-File.
  • Keywords
    Clustering algorithms; Extraterrestrial measurements; Indexing; Information retrieval; Nearest neighbor searches; Proposals; Protein sequence; Query processing; Robustness; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering Workshops, 2005. 21st International Conference on
  • Print_ISBN
    0-7695-2657-8
  • Type

    conf

  • DOI
    10.1109/ICDE.2005.226
  • Filename
    1647780