DocumentCode
3148354
Title
Algorithm Research for the Noise of Information Extraction Based Vision and DOM Tree
Author
Sun, Tieli ; Li, Zhiying ; Liu, Yanji ; Liu, Zhenghong
Author_Institution
Sch. of Comput. Sci., Northeast Normal Univ., Changchun, China
fYear
2009
fDate
15-16 May 2009
Firstpage
81
Lastpage
84
Abstract
Information extraction from Web sites is nowadays a relevant problem, usually performed by software modules called wrappers. Introduced the relevant information extraction technology. A combination of HTML pages to extract information of the theme and extract the contents. First of all, to remove noise combination of visual block, the vision-based DOM tree denoising methods to improve the efficiency of extraction.
Keywords
Web sites; hypermedia markup languages; information retrieval; trees (mathematics); HTML pages; Web sites; information extraction; vision-based DOM tree denoising methods; wrappers software modules; Computer science; Computer science education; Data mining; Databases; HTML; Software algorithms; Software performance; Sun; Ubiquitous computing; Web pages; DOM tree; information extraction; match technology; wrapper;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Ubiquitous Computing and Education, 2009 International Symposium on
Conference_Location
Chengdu
Print_ISBN
978-0-7695-3619-4
Type
conf
DOI
10.1109/IUCE.2009.47
Filename
5223346
Link To Document