DocumentCode :
2158594
Title :
High-Dimensional Similarity Searches Using A Metric Pseudo-Grid
Author :
Digout, Christian ; Nascimento, Mario A.
Author_Institution :
Univ. of Alberta, Canada
fYear :
2005
fDate :
05-08 April 2005
Firstpage :
1174
Lastpage :
1174
Abstract :
Despite the proposal of numerous tree-based access structures for high dimensional similarity searches, techniques based on a sequential scan have been shown to be simple yet quite efficient alternatives. Given that random accesses to disk are expensive, a linear scan of the (smaller) pre-processed dataset is often much more efficient than even a relatively small number of random disk accesses yielded by tree-based indices. In this paper we present a technique which uses a pseudo-partition of a general metric space analog to the VA-file’s partition of the vector space. The rationale is to use a number of pivot objects in the metric space, each one determining a number of hyper-rings in this space. The intersection of those rings, determine pseudo-cells analog to the VA-file cells in the vector space. In order to speedup query processing the data set is clustered (using any applicable clustering technique). Clusters not intersecting cells intersected by the query region cannot contribute to the answer set. Thus, only a few clusters are searched using an I/O efficient linear scan of the cluster’s data. The proposed technique, which we call the M-GRID, is, by construction, applicable to both general metric spaces and to traditional vector spaces as long as a metric distance function is used. The M-GRID is robust to several parameters and experiments with synthetic and real data sets show that it is able to perform nearest neighbor queries up to 10 times faster than the VA-File.
Keywords :
Clustering algorithms; Extraterrestrial measurements; Indexing; Information retrieval; Nearest neighbor searches; Proposals; Protein sequence; Query processing; Robustness; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering Workshops, 2005. 21st International Conference on
Print_ISBN :
0-7695-2657-8
Type :
conf
DOI :
10.1109/ICDE.2005.226
Filename :
1647780
Link To Document :
بازگشت