• DocumentCode
    3140734
  • Title

    Semi-automatic wrapper generation for Internet information sources

  • Author

    Ashish, Naveen ; Knoblock, Craig A.

  • Author_Institution
    Inf. Sci. Inst., Univ. of Southern California, Marina del Rey, CA, USA
  • fYear
    1997
  • fDate
    24-27 Jun 1997
  • Firstpage
    160
  • Lastpage
    169
  • Abstract
    To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), the authors are building information mediators for extracting and integrating data from multiple Web sources. In a mediator based approach, wrappers are built around individual information sources to translate between the mediator query language and the individual sources. They present an approach for semi-automatically generating wrappers for structured Internet sources. The key idea is to exploit formatting information in Web pages to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. They demonstrate the ease with which they are able to build wrappers for a number of Web sources using their implemented wrapper generation toolkit
  • Keywords
    Internet; distributed databases; information retrieval; query languages; query processing; Internet information sources; Web pages; World Wide Web; data extraction; data integration; formatting information; information mediators; mediator query language; multiple Web sources; semi-automatic wrapper generation; structured Internet source; wrapper generation toolkit; Buildings; Computer science; Contracts; Data mining; Database languages; Government; Internet; Web pages; Web sites; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cooperative Information Systems, 1997. COOPIS '97., Proceedings of the Second IFCIS International Conference on
  • Conference_Location
    Kiawah Island, SC
  • Print_ISBN
    0-8186-7946-8
  • Type

    conf

  • DOI
    10.1109/COOPIS.1997.613813
  • Filename
    613813