• DocumentCode
    430254
  • Title

    Context Generalization for Information Extraction from the Web

  • Author

    Habegger, Benjamin ; Quafafou, Mohamed

  • Author_Institution
    LINA, France
  • fYear
    2004
  • fDate
    20-24 Sept. 2004
  • Firstpage
    720
  • Lastpage
    723
  • Abstract
    Many online data sources, such as product catalogs, on-line directories, etc. are available on the web. Extracting information from such sources is a hard task since these sources are designed to be presented to human users. Many researchers have tackled the problem of building wrappers for such sources. The state of the art approach is to use machine learning techniques based on fully labeled example pages. In this paper we propose and study an approach based on example instances. This allows the user to build a wrapper using only a handful of examples of the whole source allowing to take into account structural differences. The patterns obtained allow to extract the instances of the relation described by the examples and contained in the same data source.
  • Keywords
    Application software; Buildings; Catalogs; Data mining; Humans; Induction generators; Labeling; Machine learning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
  • Print_ISBN
    0-7695-2100-2
  • Type

    conf

  • DOI
    10.1109/WI.2004.10076
  • Filename
    1410905