مرکز منطقه ای اطلاع رساني علوم و فناوري - Web information extraction based on news domain ontology theory

DocumentCode :

2639076

Title :

Web information extraction based on news domain ontology theory

Author :

Shi, Junfang ; Li Liu

Author_Institution :

Sch. of Inf. Eng., Univ. of Sci. & Technol. Beijing, Beijing, China

fYear :

2010

fDate :

16-17 Aug. 2010

Firstpage :

416

Lastpage :

419

Abstract :

For the current web information extraction can´t adapt to the various page structures, this paper proposes a Web Information Extraction Method based on News Domain Ontology. The areas are accurately found out and the interested information was extracted exactly based on information extraction rules which is generated by news domain ontology. Using the technology of page processing, page conversion, XPath etc, the information extraction system based on news domain ontology is implemented. Testing from news site shows that the approach proposed doesn´t rely on the page structure and it can increase the recall and precision of information extraction.

Keywords :

Internet; information retrieval; Web information extraction; XPath; information extraction rules; information extraction system; news domain ontology theory; page conversion; page processing; page structures; Data mining; HTML; Navigation; Ontologies; Pattern matching; Web pages; XML;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Web Society (SWS), 2010 IEEE 2nd Symposium on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-6356-5

Type :

conf

DOI :

10.1109/SWS.2010.5607416

Filename :

5607416

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2639076