DocumentCode
2053578
Title
Web Data Extraction Based on Simple Tree Matching
Author
Wang, Hua ; Zhang, Yang
Author_Institution
Coll. of Inf. Eng., Northwest A&F Univ., Yangling, China
Volume
2
fYear
2010
fDate
14-15 Aug. 2010
Firstpage
15
Lastpage
18
Abstract
The information on the Internet has been grown exponentially, the Internet users are overwhelmed by these information. How to automatically extract useful information from the relevant pages, so as to provide a convenient and rapid information query platform for the users, is an important issue. In this paper, based on simple tree matching algorithm, we present a Web data extraction method based on simple tree matching by analyzing the structure and content of Web documents. Experimental results on Web data from several famous websites show that the proposed Web data extraction method can effectively extract data records from similar Web pages, with extraction precision reached about 90%, and can meet the requirement of extracting accurate data in real-life applications.
Keywords
Web services; data mining; query processing; trees (mathematics); Internet; Web data extraction method; Web documents; Web pages; Web sites; information query platform; simple tree matching algorithm; Artificial intelligence; Books; Data mining; Feature extraction; HTML; Heuristic algorithms; Web pages; DOM; Information Extraction; Simple tree matching; XPath;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Engineering (ICIE), 2010 WASE International Conference on
Conference_Location
Beidaihe, Hebei
Print_ISBN
978-1-4244-7506-3
Electronic_ISBN
978-1-4244-7507-0
Type
conf
DOI
10.1109/ICIE.2010.100
Filename
5571205
Link To Document