DocumentCode
3140734
Title
Semi-automatic wrapper generation for Internet information sources
Author
Ashish, Naveen ; Knoblock, Craig A.
Author_Institution
Inf. Sci. Inst., Univ. of Southern California, Marina del Rey, CA, USA
fYear
1997
fDate
24-27 Jun 1997
Firstpage
160
Lastpage
169
Abstract
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), the authors are building information mediators for extracting and integrating data from multiple Web sources. In a mediator based approach, wrappers are built around individual information sources to translate between the mediator query language and the individual sources. They present an approach for semi-automatically generating wrappers for structured Internet sources. The key idea is to exploit formatting information in Web pages to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. They demonstrate the ease with which they are able to build wrappers for a number of Web sources using their implemented wrapper generation toolkit
Keywords
Internet; distributed databases; information retrieval; query languages; query processing; Internet information sources; Web pages; World Wide Web; data extraction; data integration; formatting information; information mediators; mediator query language; multiple Web sources; semi-automatic wrapper generation; structured Internet source; wrapper generation toolkit; Buildings; Computer science; Contracts; Data mining; Database languages; Government; Internet; Web pages; Web sites; World Wide Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Cooperative Information Systems, 1997. COOPIS '97., Proceedings of the Second IFCIS International Conference on
Conference_Location
Kiawah Island, SC
Print_ISBN
0-8186-7946-8
Type
conf
DOI
10.1109/COOPIS.1997.613813
Filename
613813
Link To Document