• DocumentCode
    3088172
  • Title

    Cost-benefit analysis of Web bag in a Web warehouse

  • Author

    Bhowmick, Sourav S. ; Madria, Sanjay ; Ng, Wee-Keong ; Lim, Ee-Peng

  • Author_Institution
    Centre for Adv. Inf. Syst., Nanyang Technol. Univ., Singapore
  • fYear
    1999
  • fDate
    36373
  • Firstpage
    34
  • Lastpage
    42
  • Abstract
    Sets and bags are closely related structures and have been studied in relational databases. A bag is different from a set in that it is sensitive to the number of times an element occurs, while a set is not. In this paper, we introduce the concept of a Web bag in the context of a World Wide Web warehouse called WHOWEDA (WareHouse Of WEb DAta) which we are currently building. Informally, a Web bag is a Web table which allows multiple occurrences of identical Web types. A Web bag helps one to discover useful knowledge from a Web table, such as visible documents or Web sites (i.e. documents/sites which can be reached by many paths), luminous documents (i.e. documents with many outgoing links) and luminous paths (i.e. frequently traversed paths). In this paper, we provide a cost-benefit analysis of materializing Web bags as compared to Web tables with distinct Web tuples
  • Keywords
    cost-benefit analysis; data mining; data structures; data warehouses; information resources; search engines; WHOWEDA; Web bags; Web tables; Web tuples; World Wide Web warehouse; cost-benefit analysis; element occurrence; fan-in; fan-out; frequently traversed paths; identical Web types; luminous documents; luminous paths; outgoing links; useful knowledge discovery; visible Web sites; visible documents; Computer science; Cost benefit analysis; Current measurement; Educational technology; Hard disks; Information systems; Read only memory; Search engines; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Database Engineering and Applications, 1999. IDEAS '99. International Symposium Proceedings
  • Conference_Location
    Montreal, Que.
  • Print_ISBN
    0-7695-0265-2
  • Type

    conf

  • DOI
    10.1109/IDEAS.1999.787249
  • Filename
    787249