• DocumentCode
    3121851
  • Title

    Leveraging COUNT Information in Sampling Hidden Databases

  • Author

    Dasgupta, Arjun ; Zhang, Nan ; Das, Gautam

  • Author_Institution
    Univ. of Texas at Arlington, Arlington, TX
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    329
  • Lastpage
    340
  • Abstract
    A large number of online databases are hidden behind form-like interfaces which allow users to execute search queries by specifying selection conditions in the interface. Most of these interfaces return restricted answers (e.g., only top-k of the selected tuples), while many of them also accompany each answer with the COUNT of the selected tuples. In this paper, we propose techniques which leverage the COUNT information to efficiently acquire unbiased samples of the hidden database. We also discuss variants for interfaces which do not provide COUNT information. We conduct extensive experiments to illustrate the efficiency and accuracy of our techniques.
  • Keywords
    information retrieval systems; information services; user interfaces; COUNT information; form-like interfaces; hidden databases; online databases; search queries; unbiased samples; Data engineering; Databases; Engineering profession; Government; Sampling methods; Hidden databases; Optimization; Sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.112
  • Filename
    4812414