• DocumentCode
    2539453
  • Title

    A Query Keywords Based Approach for Noisy Data Elimination

  • Author

    Wang, Ying-Kui ; Tan, Qian-Mao

  • Author_Institution
    Experimentation Teaching Center of Comput., Tianjin Univ., Tianjin, China
  • fYear
    2012
  • fDate
    12-14 Oct. 2012
  • Firstpage
    508
  • Lastpage
    510
  • Abstract
    It´s important to eliminate noisy data for information extraction on the deep web. In this paper, we propose a new approach called ENDW(Eliminating Noisy Data in Web pages) based on query keywords and DOM tools to eliminate noisy data. Query keywords submitted to backend databases always appear in deep web pages. The boundary between useful data region and noisy data region is concerned with the position where the query keywords appear. Once we found this boundary, we could retain useful data region and eliminate noisy data region. Our experiments show that the approach is effective and stable.
  • Keywords
    Internet; data handling; database management systems; query processing; DOM tools; ENDW; backend databases; deep Web pages; information extraction; noisy data elimination; noisy data region; query keywords based approach; useful data region; Data mining; Databases; HTML; Noise measurement; Visualization; Web pages; deep web; noisy data elimination; web information extraction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Business Computing and Global Informatization (BCGIN), 2012 Second International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-4469-2
  • Type

    conf

  • DOI
    10.1109/BCGIN.2012.138
  • Filename
    6382579