• DocumentCode
    495194
  • Title

    Web Data Extraction Based on Label Library

  • Author

    Tan, Shoubiao ; Fan, Jin ; Jiang, Yuan

  • Author_Institution
    Anhui Univ., Hefei, China
  • Volume
    5
  • fYear
    2009
  • fDate
    March 31 2009-April 2 2009
  • Firstpage
    134
  • Lastpage
    138
  • Abstract
    A Web data extraction technique based on label library is proposed for extracting information from data intensive Web pages in this paper. It eliminates conception ambiguity of the contents of Web pages with the label library, mines data regions by using MDR repeated patterns discovery algorithm, recognizes their structure and extracts data from them through a novel hierarchic pattern recognition and data extraction algorithm. Experiments showed it has perfect effect.
  • Keywords
    Internet; data mining; information retrieval; MDR repeated patterns discovery algorithm; Web data extraction; Web pages; data mining; hierarchic pattern recognition; information extraction; label library; Computer science; Data engineering; Data mining; Information resources; Labeling; Libraries; Pattern recognition; Programming profession; Web pages; Writing; Web information extraction; data intensive Web pages; label library;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Information Engineering, 2009 WRI World Congress on
  • Conference_Location
    Los Angeles, CA
  • Print_ISBN
    978-0-7695-3507-4
  • Type

    conf

  • DOI
    10.1109/CSIE.2009.595
  • Filename
    5170512