DocumentCode :
3121851
Title :
Leveraging COUNT Information in Sampling Hidden Databases
Author :
Dasgupta, Arjun ; Zhang, Nan ; Das, Gautam
Author_Institution :
Univ. of Texas at Arlington, Arlington, TX
fYear :
2009
fDate :
March 29 2009-April 2 2009
Firstpage :
329
Lastpage :
340
Abstract :
A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNT information. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.
Keywords :
information retrieval systems; information services; user interfaces; COUNT information; form-like interfaces; hidden databases; online databases; search queries; unbiased samples; Data engineering; Databases; Engineering profession; Government; Sampling methods; Hidden databases; Optimization; Sampling;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
Conference_Location :
Shanghai
ISSN :
1084-4627
Print_ISBN :
978-1-4244-3422-0
Electronic_ISBN :
1084-4627
Type :
conf
DOI :
10.1109/ICDE.2009.112
Filename :
4812414
Link To Document :
بازگشت