• DocumentCode
    2848190
  • Title

    Modeling and managing content changes in text databases

  • Author

    Ipeirotis, Panagiotis G. ; Ntoulas, Alexandros ; Cho, Junghoo ; Gravano, Luis

  • Author_Institution
    New York Univ., NY, USA
  • fYear
    2005
  • fDate
    5-8 April 2005
  • Firstpage
    606
  • Lastpage
    617
  • Abstract
    Large amounts of (often valuable) information are stored in Web-accessible text databases. "Metasearchers" provide unified interfaces to query multiple such databases at once. For efficiency, metasearchers rely on succinct statistical summaries of the database contents to select the best databases for each query. So far, database selection research has largely assumed that databases are static, so the associated statistical summaries do not need to change over time. However, databases are rarely static and the statistical summaries that describe their contents need to be updated periodically to reflect content changes. In this paper, we first report the results of a study showing how the content summaries of 152 real Web databases evolved over a period of 52 weeks. Then, we show how to use "survival analysis" techniques in general, and Cox\´s proportional hazards regression in particular, to model database changes over time and predict when we should update each content summary. Finally, we exploit our change model to devise update schedules that keep the summaries up to date by contacting databases only when needed, and then we evaluate the quality of our schedules experimentally over real Web databases.
  • Keywords
    Internet; content management; query formulation; statistical databases; very large databases; Web-accessible text databases; content change management; query processing; statistical summary; survival analysis; Aggregates; Content management; Databases; Frequency; Hazards; Pattern analysis; Predictive models; Search engines; Statistical analysis; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
  • ISSN
    1084-4627
  • Print_ISBN
    0-7695-2285-8
  • Type

    conf

  • DOI
    10.1109/ICDE.2005.91
  • Filename
    1410178