DocumentCode :
2081624
Title :
Surrogate ranking for very expensive similarity queries
Author :
Xu, Fei ; Jampani, Ravi ; Wu, Mingxi ; Jermaine, Chris ; Kahveci, Tamer
Author_Institution :
CISE Dept., Univ. of Florida, Gainesville, FL, USA
fYear :
2010
fDate :
1-6 March 2010
Firstpage :
848
Lastpage :
859
Abstract :
We consider the problem of similarity search in applications where the cost of computing the similarity between two records is very expensive, and the similarity measure is not a metric. In such applications, comparing even a tiny fraction of the database records to a single query record can be orders of magnitude slower than reading the entire database from disk, and indexing is often not possible. We develop a general-purpose, statistical framework for answering top-k queries in such databases, when the database administrator is able to supply an inexpensive surrogate ranking function that substitutes for the actual similarity measure. We develop a robust method that learns the relationship between the surrogate function and the similarity measure. Given a query, we use Bayesian statistics to update the model by taking into account the observed partial results. Using the updated model, we construct bounds on the accuracy of the result set obtained via the surrogate ranking. Our experiments show that our models can produce useful bounds for several real-life applications.
Keywords :
Bayes methods; data mining; query formulation; query processing; Bayesian statistics; computing cost; database administrator; database records; expensive similarity queries; query record; similarity search; surrogate ranking; top-k queries; Application software; Atomic measurements; Biochemistry; Computer science; Costs; Databases; Drugs; Indexing; Proteins; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering (ICDE), 2010 IEEE 26th International Conference on
Conference_Location :
Long Beach, CA
Print_ISBN :
978-1-4244-5445-7
Electronic_ISBN :
978-1-4244-5444-0
Type :
conf
DOI :
10.1109/ICDE.2010.5447888
Filename :
5447888
Link To Document :
بازگشت