• DocumentCode
    610391
  • Title

    Breaking the top-k barrier of hidden web databases?

  • Author

    Thirumuruganathan, Saravanan ; Nan Zhang ; Das, Goutam

  • Author_Institution
    Univ. of Texas at Arlington, Arlington, TX, USA
  • fYear
    2013
  • fDate
    8-12 April 2013
  • Firstpage
    1045
  • Lastpage
    1056
  • Abstract
    A large number of web databases are only accessible through proprietary form-like interfaces which require users to query the system by entering desired values for a few attributes. A key restriction enforced by such an interface is the top-k output constraint - i.e., when there are a large number of matching tuples, only a few (top-k) of them are preferentially selected and returned by the website, often according to a proprietary ranking function. Since most web database owners set k to be a small value, the top-k output constraint prevents many interesting third-party (e.g., mashup) services from being developed over real-world web databases. In this paper we consider the novel problem of “digging deeper” into such web databases. Our main contribution is the meta-algorithm GetNext that can retrieve the next ranked tuple from the hidden web database using only the restrictive interface of a web database without any prior knowledge of its ranking function. This algorithm can then be called iteratively to retrieve as many top ranked tuples as necessary. We develop principled and efficient algorithms that are based on generating and executing multiple reformulated queries and inferring the next ranked tuple from their returned results. We provide theoretical analysis of our algorithms, as well as extensive experimental results over synthetic and real-world databases that illustrate the effectiveness of our techniques.
  • Keywords
    Web sites; database management systems; query processing; Web site; digging deeper problem; hidden Web database interface; mashup service; matching tuple; meta-algorithm GetNext; next ranked tuple retrieval; proprietary ranking function; real-world Web database; real-world database; reformulated query; synthetic database; third-party service; top-k barrier; top-k output constraint; Algorithm design and analysis; Knowledge engineering; Mashups; Query processing; Silicon; Testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering (ICDE), 2013 IEEE 29th International Conference on
  • Conference_Location
    Brisbane, QLD
  • ISSN
    1063-6382
  • Print_ISBN
    978-1-4673-4909-3
  • Electronic_ISBN
    1063-6382
  • Type

    conf

  • DOI
    10.1109/ICDE.2013.6544896
  • Filename
    6544896