DocumentCode :
2568490
Title :
Reverse Method for Labeling the Information from Semi-Structured Web Pages
Author :
Akbar, Z. ; Handoko, L.T.
Author_Institution :
Group for Theor. & Comput. Phys., Indonesian Inst. of Sci., Tangerang, Indonesia
fYear :
2009
fDate :
15-17 May 2009
Firstpage :
551
Lastpage :
555
Abstract :
We propose a new technique to infer the structure and extract the tokens of data from the semi-structured Web sources which are generated using a consistent template or layout with some implicit regularities. The attributes are extracted and labeled reversely from the region of interest of targeted contents. This is in contrast with the existing techniques which always generate the trees from the root. We argue and show that our technique is simpler, more accurate and effective especially to detect the changes of the templates of targeted Web pages.
Keywords :
Internet; information retrieval; tree data structures; attribute extraction; consistent template; implicit regularity; information labeling; reverse method; semi structured Web page; token extraction; trees; Data mining; Databases; Humans; Information systems; Labeling; Physics computing; Search engines; Signal processing; Uniform resource locators; Web pages; data extraction; data mining; web-based information system;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
2009 International Conference on Signal Processing Systems
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3654-5
Type :
conf
DOI :
10.1109/ICSPS.2009.86
Filename :
5166847
Link To Document :
بازگشت