DocumentCode
2539453
Title
A Query Keywords Based Approach for Noisy Data Elimination
Author
Wang, Ying-Kui ; Tan, Qian-Mao
Author_Institution
Experimentation Teaching Center of Comput., Tianjin Univ., Tianjin, China
fYear
2012
fDate
12-14 Oct. 2012
Firstpage
508
Lastpage
510
Abstract
It´s important to eliminate noisy data for information extraction on the deep web. In this paper, we propose a new approach called ENDW(Eliminating Noisy Data in Web pages) based on query keywords and DOM tools to eliminate noisy data. Query keywords submitted to backend databases always appear in deep web pages. The boundary between useful data region and noisy data region is concerned with the position where the query keywords appear. Once we found this boundary, we could retain useful data region and eliminate noisy data region. Our experiments show that the approach is effective and stable.
Keywords
Internet; data handling; database management systems; query processing; DOM tools; ENDW; backend databases; deep Web pages; information extraction; noisy data elimination; noisy data region; query keywords based approach; useful data region; Data mining; Databases; HTML; Noise measurement; Visualization; Web pages; deep web; noisy data elimination; web information extraction;
fLanguage
English
Publisher
ieee
Conference_Titel
Business Computing and Global Informatization (BCGIN), 2012 Second International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4673-4469-2
Type
conf
DOI
10.1109/BCGIN.2012.138
Filename
6382579
Link To Document