DocumentCode :
1809885
Title :
The XML-based Information Extraction on Data-intensive Page
Author :
Li, Yanheng
Author_Institution :
Dalian Maritime Univ., Dalian
fYear :
2007
fDate :
18-21 Sept. 2007
Firstpage :
1027
Lastpage :
1030
Abstract :
This paper puts forward an XML-based information extraction method which applies XSLT and XPath technology to construct extraction rules. The aim of this method is to extract useful information from data-intensive pages. This paper firstly analyzes the traits of data- intensive pages. Aiming at those traits, we proposed a path induction method to conclude record pattern of pages, to obtain the path expression of useful information, and eventually to construct extraction rules. Furthermore, this paper presents the method of optimization of extraction rules in order to getting more robust rules.
Keywords :
XML; knowledge acquisition; XML-based information extraction; XPath; XSLT; data-intensive page; data-intensive pages; extraction rules; path expression; path induction method; record pattern; Computer networks; Concurrent computing; Data mining; Databases; HTML; Optimization methods; Parallel processing; Robustness; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network and Parallel Computing Workshops, 2007. NPC Workshops. IFIP International Conference on
Conference_Location :
Liaoning
Print_ISBN :
978-0-7695-2943-1
Type :
conf
DOI :
10.1109/NPC.2007.153
Filename :
4351622
Link To Document :
بازگشت