DocumentCode :
1955298
Title :
XPath-Wrapper Induction for Data Extraction
Author :
Tran, Nam-Khanh ; Pham, Kim-Cuong ; Ha, Quang-Thuy
Author_Institution :
Coll. of Eng. & Technol., Vietnam Nat. Univ., Hanoi, Vietnam
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
150
Lastpage :
153
Abstract :
The Web contains an enormous amount of information which is formatted for human beings. This makes it difficult for computer to extract relevant content from various sources. This paper presents an XPath-wrapper induction algorithm which leverages user queries and template-based sites for extracting structured information. Our experiments show average accuracy of 94%.
Keywords :
Internet; query processing; Web; data extraction; template-based sites; user queries; xpath-wrapper induction algorithm; Data mining; Feature extraction; HTML; Portable computers; Search engines; Web pages;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.33
Filename :
5681601
Link To Document :
بازگشت