• DocumentCode
    892834
  • Title

    From Wrapping to Knowledge

  • Author

    Arjona, José L. ; Corchuelo, Rafael ; Ruiz, David ; Toro, Miguel

  • Author_Institution
    Departamento de Electronica, Sistemas Informaticos y Automatica, Escuela Politecnica Superior, Huelva
  • Volume
    19
  • Issue
    2
  • fYear
    2007
  • Firstpage
    310
  • Lastpage
    323
  • Abstract
    One the most challenging problems for enterprise information integration is to deal with heterogeneous information sources on the Web. The reason is that they usually provide information that is in human-readable form only, which makes it difficult for a software agent to understand it. Current solutions build on the idea of annotating the information with semantics. If the information is unstructured, proposals such as S-CREAM, MnM, or Armadillo may be effective enough since they rely on using natural language processing techniques; furthermore, their accuracy can be improved by using redundant information on the Web, as C-PANKOW has proved recently. If the information is structured and closely related to a back-end database, deep annotation ranges among the most effective proposals, but it requires the information providers to modify their applications; if deep annotation is not applicable, the easiest solution consists of using a wrapper and transforming its output into annotations. In this paper, we prove that this transformation can be automated by means of an efficient, domain-independent algorithm. To the best of our knowledge, this is the first attempt to devise and formalize such a systematic, general solution
  • Keywords
    Internet; business data processing; natural language processing; software agents; text analysis; C-PANKOW; S-CREAM; World Wide Web; back-end database; deep annotation; domain-independent algorithm; enterprise information integration; human-readable form only; information sources; natural language processing techniques; redundant information; semiautomatic annotation; software agent; wrappers; Application software; Data mining; Databases; Information resources; Natural language processing; Proposals; Semantic Web; Software agents; Web pages; Wrapping; Enterprise information integration; semiautomatic annotation.; wrappers;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2007.31
  • Filename
    4039292