DocumentCode :
2237051
Title :
Multiple Wrappers Information Extraction Method Based on Tree Model
Author :
Suo, Hongguang ; Feng, Lin ; Zhang, Yong
Author_Institution :
Comput. & Commun. Eng. Dept., China Univ. of Pet., Dongying, China
fYear :
2009
fDate :
26-28 Dec. 2009
Firstpage :
904
Lastpage :
907
Abstract :
Previous wrappers of information extraction are mostly set for the web page which has one information block, so they are unable to deal with the multiple information blocks web page. Because of the adapt ability of the above mentioned shortcomings, this paper proposes a multiple wrappers information extraction method based on tree model. It will generate a wrapper for every block using the Tree-Align algorithm put forward by this paper. Finally it will use the wrapper to extract the structural information corresponding to every block. According to the experiment, this new method can extract the multiple information blocks web page accurately and efficiently.
Keywords :
Internet; information retrieval; Web page; multiple wrappers information extraction method; tree model; tree-align algorithm; Data mining; Displays; Fuel economy; HTML; Information science; Internet; Military computing; Petroleum; Strips; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
Type :
conf
DOI :
10.1109/ICISE.2009.770
Filename :
5455704
Link To Document :
بازگشت