• DocumentCode
    3200027
  • Title

    Web services for information extraction from the Web

  • Author

    Habegger, Benjamin ; Quafafou, Mohamed

  • Author_Institution
    Lab. d´´Informatique de Nantes Atlantique, Nantes Univ., France
  • fYear
    2004
  • fDate
    6-9 July 2004
  • Firstpage
    279
  • Lastpage
    286
  • Abstract
    Extracting information from the Web is a complex task with different components which can either be generic or specific to the task, going from downloading a given page, following links, querying a Web-based applications via an HTML form and the HTTP protocol, querying a Web service via the SOAP protocol, etc. Therefore building Web services which proceed to executing an information tasks can not be simply hard coded (i.e. written and compiled once and for all in a given programming language). In order to be able to build flexible information extraction Web Services we need to be able to compose different sub tasks together. We propose a, XML-based language to describe information extraction Web services as the compositions of existing Web services and specific functions. The usefulness the proposed framework is demonstrated by three real world applications. (1) Search engines: we show how to describe a task which queries Google´s Web service, retrieves more information on the results by querying their respective HTTP servers, and filters them according to this information. (2) E-commerce sites : an information extraction Web service giving access to an existing HTML-based e-commerce online application such as Amazon is built. (3) Patent extraction: a last example shows how to describe an information extraction Web service which allows to query a Web-based application, extract the set of result links, follow them, and extract the needed information on the result pages. In all three applications the generated description can be easily modified and completed to further respond the user´s needs and create value-added Web services.
  • Keywords
    Web sites; XML; electronic commerce; information filters; information retrieval; knowledge acquisition; search engines; Amazon; Google Web service; HTML; HTML-based e-commerce online application; HTTP protocol; HTTP servers; SOAP protocol; Web information extraction; Web links; Web page downloading; Web service querying; Web-based applications; XML-based language; e-commerce sites; information filtering; information retrieval; patent extraction; search engines; value-added Web services; Computer languages; Data mining; HTML; Information filtering; Information filters; Information retrieval; Search engines; Simple object access protocol; Web server; Web services;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Services, 2004. Proceedings. IEEE International Conference on
  • Print_ISBN
    0-7695-2167-3
  • Type

    conf

  • DOI
    10.1109/ICWS.2004.1314749
  • Filename
    1314749