DocumentCode :
2639076
Title :
Web information extraction based on news domain ontology theory
Author :
Shi, Junfang ; Li Liu
Author_Institution :
Sch. of Inf. Eng., Univ. of Sci. & Technol. Beijing, Beijing, China
fYear :
2010
fDate :
16-17 Aug. 2010
Firstpage :
416
Lastpage :
419
Abstract :
For the current web information extraction can´t adapt to the various page structures, this paper proposes a Web Information Extraction Method based on News Domain Ontology. The areas are accurately found out and the interested information was extracted exactly based on information extraction rules which is generated by news domain ontology. Using the technology of page processing, page conversion, XPath etc, the information extraction system based on news domain ontology is implemented. Testing from news site shows that the approach proposed doesn´t rely on the page structure and it can increase the recall and precision of information extraction.
Keywords :
Internet; information retrieval; Web information extraction; XPath; information extraction rules; information extraction system; news domain ontology theory; page conversion; page processing; page structures; Data mining; HTML; Navigation; Ontologies; Pattern matching; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Society (SWS), 2010 IEEE 2nd Symposium on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6356-5
Type :
conf
DOI :
10.1109/SWS.2010.5607416
Filename :
5607416
Link To Document :
بازگشت