DocumentCode :
2710517
Title :
Research for Information Extraction Based on Wrapper Model Algorithm
Author :
Zhiwei, Xu ; Xinghua, Wang
Author_Institution :
Dept. of Comput. Sci., Chang Chun Univ., Chang Chun, China
fYear :
2010
fDate :
7-10 May 2010
Firstpage :
652
Lastpage :
655
Abstract :
Mainly on data-intensive Web site research experiment. In the web pages of the automatically generated wrapper method of research-based information extraction, the main job is to make the page tree matching algorithm, the sample tree and the tree wrapper DOM tree matching two pages compared to the first to discover the page selection mode, producing the primary template, and then self-correction of primary template found iterative model, and finally generate the page wrapper method. The wrapper generation process does not require human intervention to achieve a fully automated completion. Experiment with satisfactory results.
Keywords :
Internet; Web sites; information retrieval; iterative methods; trees (mathematics); Web pages; data-intensive Web site research experiment; information extraction; page tree matching algorithm; primary template found iterative model; sample tree; tree wrapper DOM tree matching; wrapper model algorithm; Computer science; Data mining; Databases; HTML; Humans; Information technology; Iterative algorithms; Iterative methods; Research and development; Web pages; DOM tree; information extraction; match technology; wrapper;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Research and Development, 2010 Second International Conference on
Conference_Location :
Kuala Lumpur
Print_ISBN :
978-0-7695-4043-6
Type :
conf
DOI :
10.1109/ICCRD.2010.141
Filename :
5489547
Link To Document :
بازگشت