Title :
Multiple Wrappers Information Extraction Method Based on Tree Model
Author :
Suo, Hongguang ; Feng, Lin ; Zhang, Yong
Author_Institution :
Comput. & Commun. Eng. Dept., China Univ. of Pet., Dongying, China
Abstract :
Previous wrappers of information extraction are mostly set for the web page which has one information block, so they are unable to deal with the multiple information blocks web page. Because of the adapt ability of the above mentioned shortcomings, this paper proposes a multiple wrappers information extraction method based on tree model. It will generate a wrapper for every block using the Tree-Align algorithm put forward by this paper. Finally it will use the wrapper to extract the structural information corresponding to every block. According to the experiment, this new method can extract the multiple information blocks web page accurately and efficiently.
Keywords :
Internet; information retrieval; Web page; multiple wrappers information extraction method; tree model; tree-align algorithm; Data mining; Displays; Fuel economy; HTML; Information science; Internet; Military computing; Petroleum; Strips; Web pages;
Conference_Titel :
Information Science and Engineering (ICISE), 2009 1st International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-4909-5
DOI :
10.1109/ICISE.2009.770