DocumentCode
2966991
Title
Semi-Automated Wrappers Using Rule Trees
Author
Iasinschi, Adrian ; Cosulschi, Mirel
Author_Institution
Fac. of Math. & Comput. Sci., Univ. of Craiova, Craiova, Romania
fYear
2008
fDate
26-29 Sept. 2008
Firstpage
209
Lastpage
215
Abstract
In this paper we describe the concept of a semi-automated wrapper for extracting information from semi-structured pages, usually part of the e-commerce data intensive web sites. The process is based on creating extraction rules in a visual manner, using the DOM tree associated to a XHTML document, helping the user to make the right decisions. The extraction rules defined have a natural tree structure. Based on the model designed, the wrapper can then be used to navigate through the site and extract the relevant data.
Keywords
Web sites; electronic commerce; information retrieval; tree data structures; DOM tree; XHTML document; data intensive Web sites; e-commerce; information extraction; rule trees; semiautomated wrapper; semistructured pages; tree structure; Competitive intelligence; Computer science; Data mining; HTML; Humans; Java; Mathematics; Scientific computing; Web pages; XML; rule; semi-automated wrapper; tree; web data extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Symbolic and Numeric Algorithms for Scientific Computing, 2008. SYNASC '08. 10th International Symposium on
Conference_Location
Timisoara
Print_ISBN
978-0-7695-3523-4
Type
conf
DOI
10.1109/SYNASC.2008.67
Filename
5204813
Link To Document