• DocumentCode
    2966991
  • Title

    Semi-Automated Wrappers Using Rule Trees

  • Author

    Iasinschi, Adrian ; Cosulschi, Mirel

  • Author_Institution
    Fac. of Math. & Comput. Sci., Univ. of Craiova, Craiova, Romania
  • fYear
    2008
  • fDate
    26-29 Sept. 2008
  • Firstpage
    209
  • Lastpage
    215
  • Abstract
    In this paper we describe the concept of a semi-automated wrapper for extracting information from semi-structured pages, usually part of the e-commerce data intensive web sites. The process is based on creating extraction rules in a visual manner, using the DOM tree associated to a XHTML document, helping the user to make the right decisions. The extraction rules defined have a natural tree structure. Based on the model designed, the wrapper can then be used to navigate through the site and extract the relevant data.
  • Keywords
    Web sites; electronic commerce; information retrieval; tree data structures; DOM tree; XHTML document; data intensive Web sites; e-commerce; information extraction; rule trees; semiautomated wrapper; semistructured pages; tree structure; Competitive intelligence; Computer science; Data mining; HTML; Humans; Java; Mathematics; Scientific computing; Web pages; XML; rule; semi-automated wrapper; tree; web data extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Symbolic and Numeric Algorithms for Scientific Computing, 2008. SYNASC '08. 10th International Symposium on
  • Conference_Location
    Timisoara
  • Print_ISBN
    978-0-7695-3523-4
  • Type

    conf

  • DOI
    10.1109/SYNASC.2008.67
  • Filename
    5204813