مرکز منطقه ای اطلاع رساني علوم و فناوري - The XML-based Information Extraction on Data-intensive Page

DocumentCode :

1809885

Title :

The XML-based Information Extraction on Data-intensive Page

Author :

Li, Yanheng

Author_Institution :

Dalian Maritime Univ., Dalian

fYear :

2007

fDate :

18-21 Sept. 2007

Firstpage :

1027

Lastpage :

1030

Abstract :

This paper puts forward an XML-based information extraction method which applies XSLT and XPath technology to construct extraction rules. The aim of this method is to extract useful information from data-intensive pages. This paper firstly analyzes the traits of data- intensive pages. Aiming at those traits, we proposed a path induction method to conclude record pattern of pages, to obtain the path expression of useful information, and eventually to construct extraction rules. Furthermore, this paper presents the method of optimization of extraction rules in order to getting more robust rules.

Keywords :

XML; knowledge acquisition; XML-based information extraction; XPath; XSLT; data-intensive page; data-intensive pages; extraction rules; path expression; path induction method; record pattern; Computer networks; Concurrent computing; Data mining; Databases; HTML; Optimization methods; Parallel processing; Robustness; Web pages; XML;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on

Conference_Location :

Liaoning

Print_ISBN :

978-0-7695-2943-1

Type :

conf

DOI :

10.1109/NPC.2007.153

Filename :

4351622

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1809885