DocumentCode :
1962348
Title :
XWRAP: an XML-enabled wrapper construction system for Web information sources
Author :
Liu, Ling ; Pu, Calton ; Han, Wei
Author_Institution :
Coll. of Comput., Georgia Inst. of Technol., Atlanta, GA, USA
fYear :
2000
fDate :
2000
Firstpage :
611
Lastpage :
621
Abstract :
The paper describes the methodology and the software development of XWRAP, an XML-enabled wrapper construction system for semi-automatic generation of wrapper programs. By XML-enabled we mean that the metadata about information content that are implicit in the original Web pages will be extracted and encoded explicitly as XML tags in the wrapped documents. In addition, the query based content filtering process is performed against the XML documents. The XWRAP wrapper generation framework has three distinct features. First, it explicitly separates tasks of building wrappers that are specific to a Web source from the tasks that are repetitive for any source, and uses a component library to provide basic building blocks for wrapper programs. Second, it provides a user friendly interface program to allow wrapper developers to generate their wrapper code with a few mouse clicks. Third and most importantly, we introduce and develop a two-phase code generation framework. The first phase utilizes an interactive interface facility to encode the source-specific metadata knowledge identified by individual wrapper developers as declarative information extraction rules. The second phase combines the information extraction rules generated at the first phase with the XWRAP component library to construct an executable wrapper program for the given Web source. We report the initial experiments on performance of the XWRAP code generation system and the wrapper programs generated by XWRAP
Keywords :
desktop publishing; hypermedia markup languages; information resources; interactive systems; meta data; user interfaces; Web information sources; Web pages; Web source; XML documents; XML tags; XML-enabled wrapper construction system; XWRAP; component library; declarative information extraction rules; executable wrapper program; information content; information extraction rules; interactive interface facility; metadata; query based content filtering process; semi-automatic generation; source-specific metadata knowledge; two-phase code generation framework; user friendly interface program; wrapped documents; wrapper developers; wrapper programs; Data mining; Educational institutions; HTML; Internet; Libraries; Output feedback; Software testing; System testing; Wrapping; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2000. Proceedings. 16th International Conference on
Conference_Location :
San Diego, CA
ISSN :
1063-6382
Print_ISBN :
0-7695-0506-6
Type :
conf
DOI :
10.1109/ICDE.2000.839475
Filename :
839475
Link To Document :
بازگشت