Title :
XPath-Wrapper Induction for Data Extraction
Author :
Tran, Nam-Khanh ; Pham, Kim-Cuong ; Ha, Quang-Thuy
Author_Institution :
Coll. of Eng. & Technol., Vietnam Nat. Univ., Hanoi, Vietnam
Abstract :
The Web contains an enormous amount of information which is formatted for human beings. This makes it difficult for computer to extract relevant content from various sources. This paper presents an XPath-wrapper induction algorithm which leverages user queries and template-based sites for extracting structured information. Our experiments show average accuracy of 94%.
Keywords :
Internet; query processing; Web; data extraction; template-based sites; user queries; xpath-wrapper induction algorithm; Data mining; Feature extraction; HTML; Portable computers; Search engines; Web pages;
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
DOI :
10.1109/IALP.2010.33