• DocumentCode
    2392065
  • Title

    Web and Document Databases: An Effective Way to Explore the Internet

  • Author

    Chen, Yangjun

  • Author_Institution
    Dept. Appl. Comput. Sci., Univ. of Winnipeg, Winnipeg, MB, Canada
  • fYear
    2010
  • fDate
    18-20 Aug. 2010
  • Firstpage
    529
  • Lastpage
    534
  • Abstract
    In this paper, we discuss the architecture of a system, the so-called Web and Document Databases (WDDBS for short), designed to explore the Internet effectively and efficiently. Abstractly, a WDDBS can be defined as a triple <;D, P, W>, where (1) D stands for a local document database to store XML documents, (2) P for a subsystem responsible for remote query evaluation, including resolution of semantic conflicts among heterogeneous databases, and (3) W for a Web crawler which should be able to find information sources related to the local database in some way. Then, each information source can be organized into a WDDB distributed over the Internet, which may be connected to others through URLs. A query submitted to a WDDBS will first be evaluated against the local document database, and then possibly switched over to some remote document databases if necessary, which is controlled by the ´knowledge´ on how local WDDBSs are connected. In this way, the load of traffic over the Internet can effectively be decreased, but the information explored is more relevant.
  • Keywords
    Internet; XML; database management systems; query processing; Internet; WDDBS; Web and document databases; Web crawler; XML documents; heterogeneous databases; local document database; remote query evaluation; system architecture; Books; Crawlers; Internet; Ontologies; Query processing; XML; Web; XML document; hash tabels; semantic conflict resolution; signature trees; tree pattern queries;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Science (ICIS), 2010 IEEE/ACIS 9th International Conference on
  • Conference_Location
    Yamagata
  • Print_ISBN
    978-1-4244-8198-9
  • Type

    conf

  • DOI
    10.1109/ICIS.2010.66
  • Filename
    5590411