• DocumentCode
    2053578
  • Title

    Web Data Extraction Based on Simple Tree Matching

  • Author

    Wang, Hua ; Zhang, Yang

  • Author_Institution
    Coll. of Inf. Eng., Northwest A&F Univ., Yangling, China
  • Volume
    2
  • fYear
    2010
  • fDate
    14-15 Aug. 2010
  • Firstpage
    15
  • Lastpage
    18
  • Abstract
    The information on the Internet has been grown exponentially, the Internet users are overwhelmed by these information. How to automatically extract useful information from the relevant pages, so as to provide a convenient and rapid information query platform for the users, is an important issue. In this paper, based on simple tree matching algorithm, we present a Web data extraction method based on simple tree matching by analyzing the structure and content of Web documents. Experimental results on Web data from several famous websites show that the proposed Web data extraction method can effectively extract data records from similar Web pages, with extraction precision reached about 90%, and can meet the requirement of extracting accurate data in real-life applications.
  • Keywords
    Web services; data mining; query processing; trees (mathematics); Internet; Web data extraction method; Web documents; Web pages; Web sites; information query platform; simple tree matching algorithm; Artificial intelligence; Books; Data mining; Feature extraction; HTML; Heuristic algorithms; Web pages; DOM; Information Extraction; Simple tree matching; XPath;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Engineering (ICIE), 2010 WASE International Conference on
  • Conference_Location
    Beidaihe, Hebei
  • Print_ISBN
    978-1-4244-7506-3
  • Electronic_ISBN
    978-1-4244-7507-0
  • Type

    conf

  • DOI
    10.1109/ICIE.2010.100
  • Filename
    5571205