• DocumentCode
    2828769
  • Title

    Data extraction and annotation for dynamic Web pages

  • Author

    Song, Hui ; Giri, Suraj ; Ma, Fanyuan

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., China
  • fYear
    2004
  • fDate
    28-31 March 2004
  • Firstpage
    499
  • Lastpage
    502
  • Abstract
    Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.
  • Keywords
    Web sites; information retrieval; semantic Web; Web sites; data extraction; dynamic Web pages; semantic annotation; structured data; template; wrapper generation; Computer science; Data mining; Databases; Graphical user interfaces; HTML; Humans; Information systems; Internet; Web pages; Writing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004 IEEE International Conference on
  • Print_ISBN
    0-7695-2073-1
  • Type

    conf

  • DOI
    10.1109/EEE.2004.1287353
  • Filename
    1287353