DocumentCode
2828769
Title
Data extraction and annotation for dynamic Web pages
Author
Song, Hui ; Giri, Suraj ; Ma, Fanyuan
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., China
fYear
2004
fDate
28-31 March 2004
Firstpage
499
Lastpage
502
Abstract
Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.
Keywords
Web sites; information retrieval; semantic Web; Web sites; data extraction; dynamic Web pages; semantic annotation; structured data; template; wrapper generation; Computer science; Data mining; Databases; Graphical user interfaces; HTML; Humans; Information systems; Internet; Web pages; Writing;
fLanguage
English
Publisher
ieee
Conference_Titel
e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004 IEEE International Conference on
Print_ISBN
0-7695-2073-1
Type
conf
DOI
10.1109/EEE.2004.1287353
Filename
1287353
Link To Document