• DocumentCode
    2042079
  • Title

    Wrapper generation for Web accessible data sources

  • Author

    Gruser, Jean-Robert ; Raschid, Louiqa ; Vidal, Maria Esther ; Bright, Laura

  • Author_Institution
    Maryland Univ., College Park, MD, USA
  • fYear
    1998
  • fDate
    22-22 Aug. 1998
  • Firstpage
    14
  • Lastpage
    23
  • Abstract
    There is an increase in the number of data sources that can be queried across the WWW. Such sources typically support HTML forms-based interfaces and search engines query collections of suitably indexed data. The data is displayed via a browser: One drawback to these sources is that there is no standard programming interface suitable for applications to submit queries. Second, the output (answer to a query) is not well structured. Structured objects have to be extracted from the HTML documents which contain irrelevant data and which may be volatile. Third, domain knowledge about the data source is also embedded in HTML documents and must be extracted. To solve these problems, we present technology to define and (automatically) generate wrappers for Web accessible sources. Our contributions are as follows: (1) Defining a wrapper interface to specify the capability of Web accessible data sources. (2) Developing a wrapper generation toolkit of graphical interfaces and specification languages to specify the capability of sources and the functionality of the wrapper (3) Developing the technology to automatically generate a wrapper appropriate to the Web accessible source, from the specifications.
  • Keywords
    Internet; application program interfaces; query processing; HTML documents; WWW; search engines; wrapper generation toolkit; Data mining; Databases; Educational institutions; Electrical capacitance tomography; HTML; Read only memory; Search engines; Specification languages; Uniform resource locators; World Wide Web;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Cooperative Information Systems, 1998. Proceedings. 3rd IFCIS International Conference on
  • Conference_Location
    New York, NY, USA
  • Print_ISBN
    0-8186-8380-5
  • Type

    conf

  • DOI
    10.1109/COOPIS.1998.706180
  • Filename
    706180