Title :
SG-WRAM schema guided wrapper maintenance: a demonstration
Author :
Meng, Xiaofeng ; Wang, Haiyan ; Hu, Dongdong ; Gu, Mingzhe
Author_Institution :
Inf. Sch., Renmin Univ. of China, Beijing, China
Abstract :
We propose a novel schema-guided approach for wrapper maintenance, called SG-WRAM. SG-WRAP can generate a wrapper to extract data from an HTML document to produce an XML document conforming to the user-defined schema. Based on these observations, we fulfill the maintenance following four sequential steps. At first, syntactic features, data pattern and notation are obtained from the schema, previous rule and extracted results, and then they are used to recognize the data items. After that, they are grouped according to the given schema. Each group is an instance of the given schema. At last, the representative instances are selected to re-induce the extraction rule. We name these four steps as features discovery, item recovery, block configuration and wrapper reparation respectively. The system to be demonstrated is implemented in Java. We also consider the major algorithms used in SG-WRAM.
Keywords :
Internet; XML; information resources; HTML document; SG-WRAM; XML document; block configuration; data pattern; feature discovery; item recovery; schema guided wrapper maintenance; syntactic feature; user-defined schema; Data engineering;
Conference_Titel :
Data Engineering, 2003. Proceedings. 19th International Conference on
Print_ISBN :
0-7803-7665-X
DOI :
10.1109/ICDE.2003.1260856