Title :
Web information extraction based on news domain ontology theory
Author :
Shi, Junfang ; Li Liu
Author_Institution :
Sch. of Inf. Eng., Univ. of Sci. & Technol. Beijing, Beijing, China
Abstract :
For the current web information extraction can´t adapt to the various page structures, this paper proposes a Web Information Extraction Method based on News Domain Ontology. The areas are accurately found out and the interested information was extracted exactly based on information extraction rules which is generated by news domain ontology. Using the technology of page processing, page conversion, XPath etc, the information extraction system based on news domain ontology is implemented. Testing from news site shows that the approach proposed doesn´t rely on the page structure and it can increase the recall and precision of information extraction.
Keywords :
Internet; information retrieval; Web information extraction; XPath; information extraction rules; information extraction system; news domain ontology theory; page conversion; page processing; page structures; Data mining; HTML; Navigation; Ontologies; Pattern matching; Web pages; XML;
Conference_Titel :
Web Society (SWS), 2010 IEEE 2nd Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6356-5
DOI :
10.1109/SWS.2010.5607416