DocumentCode
1877367
Title
Web Information Extraction Algorithm Based on Ontology and DOM Tree
Author
Liu, Li ; Shi, Junfang ; Liu, Xinrui
Author_Institution
Sch. of Inf. Eng., Univ. of Sci. & Technol. Beijing, Beijing, China
fYear
2010
fDate
10-12 Dec. 2010
Firstpage
1
Lastpage
4
Abstract
Due to the information on the Web being tremendous, dynamic and irregular, it is difficult to search and integrate information from the Web. This paper proposes a Web information extraction algorithm based on Ontology and DOM tree. The areas are accurately found out and the interested information is extracted exactly by information extraction rules generated by ontology. Furthermore this algorithm implements information extraction through traveling DOM tree. Finally, we implement information extraction system and test its performance on news site. Testing result shows that this algorithm doesn´t rely on the page structure and it can increase the recall and precision of information extraction.
Keywords
Internet; information retrieval; ontologies (artificial intelligence); tree data structures; DOM tree; Web information extraction algorithm; news site; ontology; Data mining; HTML; Heuristic algorithms; Navigation; Ontologies; Web pages; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Intelligence and Software Engineering (CiSE), 2010 International Conference on
Conference_Location
Wuhan
Print_ISBN
978-1-4244-5391-7
Electronic_ISBN
978-1-4244-5392-4
Type
conf
DOI
10.1109/CISE.2010.5677052
Filename
5677052
Link To Document