• DocumentCode
    3148354
  • Title

    Algorithm Research for the Noise of Information Extraction Based Vision and DOM Tree

  • Author

    Sun, Tieli ; Li, Zhiying ; Liu, Yanji ; Liu, Zhenghong

  • Author_Institution
    Sch. of Comput. Sci., Northeast Normal Univ., Changchun, China
  • fYear
    2009
  • fDate
    15-16 May 2009
  • Firstpage
    81
  • Lastpage
    84
  • Abstract
    Information extraction from Web sites is nowadays a relevant problem, usually performed by software modules called wrappers. Introduced the relevant information extraction technology. A combination of HTML pages to extract information of the theme and extract the contents. First of all, to remove noise combination of visual block, the vision-based DOM tree denoising methods to improve the efficiency of extraction.
  • Keywords
    Web sites; hypermedia markup languages; information retrieval; trees (mathematics); HTML pages; Web sites; information extraction; vision-based DOM tree denoising methods; wrappers software modules; Computer science; Computer science education; Data mining; Databases; HTML; Software algorithms; Software performance; Sun; Ubiquitous computing; Web pages; DOM tree; information extraction; match technology; wrapper;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Ubiquitous Computing and Education, 2009 International Symposium on
  • Conference_Location
    Chengdu
  • Print_ISBN
    978-0-7695-3619-4
  • Type

    conf

  • DOI
    10.1109/IUCE.2009.47
  • Filename
    5223346