DocumentCode
430254
Title
Context Generalization for Information Extraction from the Web
Author
Habegger, Benjamin ; Quafafou, Mohamed
Author_Institution
LINA, France
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
720
Lastpage
723
Abstract
Many online data sources, such as product catalogs, on-line directories, etc. are available on the web. Extracting information from such sources is a hard task since these sources are designed to be presented to human users. Many researchers have tackled the problem of building wrappers for such sources. The state of the art approach is to use machine learning techniques based on fully labeled example pages. In this paper we propose and study an approach based on example instances. This allows the user to build a wrapper using only a handful of examples of the whole source allowing to take into account structural differences. The patterns obtained allow to extract the instances of the relation described by the examples and contained in the same data source.
Keywords
Application software; Buildings; Catalogs; Data mining; Humans; Induction generators; Labeling; Machine learning;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10076
Filename
1410905
Link To Document