• DocumentCode
    1160243
  • Title

    Detecting and representing relevant Web deltas in WHOWEDA

  • Author

    Bhowmick, Sourav S. ; Madria, Sanjay Kumar ; Ng, Wee Keong

  • Author_Institution
    Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore
  • Volume
    15
  • Issue
    2
  • fYear
    2003
  • Firstpage
    423
  • Lastpage
    441
  • Abstract
    In this paper, we present a mechanism for detecting and representing changes, given the old and new versions of a set of interlinked Web documents, retrieved in response to a user´s query. In particular, we show how to detect and represent Web deltas, i.e., changes in the Web documents that are relevant to a user´s query in the context of our Web warehousing system called WHOWEDA (Warehouse of Web Data). In WHOWEDA, Web information is materialized views stored in Web tables in the form of Web tuples. These Web tuples, represented as directed graphs, can be manipulated using a set of Web algebraic operators. In this paper, we present a mechanism to detect relevant Web deltas using Web algebraic operators such as the Web join and the outer Web join. Web join is used to detect identical documents residing in two Web tables, whereas, outer Web join, a derivative of Web join, is used to identify dangling Web tuples. We show how to represent these changes using delta Web tables. We develop formal algorithms for the generation of delta Web tables identifying Web documents which have been added, deleted, or modified since the last query.
  • Keywords
    Internet; data warehouses; information retrieval; query processing; WHOWEDA; Web documents; Web join; Web warehouse; Web warehousing; delta Web tables; interlinked Web documents; query; Change detection algorithms; Competitive intelligence; Computer Society; Databases; Diseases; Drugs; Helium; Monitoring; Warehousing; Web pages;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2003.1185843
  • Filename
    1185843