DocumentCode
892834
Title
From Wrapping to Knowledge
Author
Arjona, José L. ; Corchuelo, Rafael ; Ruiz, David ; Toro, Miguel
Author_Institution
Departamento de Electronica, Sistemas Informaticos y Automatica, Escuela Politecnica Superior, Huelva
Volume
19
Issue
2
fYear
2007
Firstpage
310
Lastpage
323
Abstract
One the most challenging problems for enterprise information integration is to deal with heterogeneous information sources on the Web. The reason is that they usually provide information that is in human-readable form only, which makes it difficult for a software agent to understand it. Current solutions build on the idea of annotating the information with semantics. If the information is unstructured, proposals such as S-CREAM, MnM, or Armadillo may be effective enough since they rely on using natural language processing techniques; furthermore, their accuracy can be improved by using redundant information on the Web, as C-PANKOW has proved recently. If the information is structured and closely related to a back-end database, deep annotation ranges among the most effective proposals, but it requires the information providers to modify their applications; if deep annotation is not applicable, the easiest solution consists of using a wrapper and transforming its output into annotations. In this paper, we prove that this transformation can be automated by means of an efficient, domain-independent algorithm. To the best of our knowledge, this is the first attempt to devise and formalize such a systematic, general solution
Keywords
Internet; business data processing; natural language processing; software agents; text analysis; C-PANKOW; S-CREAM; World Wide Web; back-end database; deep annotation; domain-independent algorithm; enterprise information integration; human-readable form only; information sources; natural language processing techniques; redundant information; semiautomatic annotation; software agent; wrappers; Application software; Data mining; Databases; Information resources; Natural language processing; Proposals; Semantic Web; Software agents; Web pages; Wrapping; Enterprise information integration; semiautomatic annotation.; wrappers;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2007.31
Filename
4039292
Link To Document