DocumentCode :
2815009
Title :
WebView: a tool for retrieving internal structures and extracting information from HTML documents
Author :
Lim, Seung-Jin ; Ng, Yiu-Kai
Author_Institution :
Dept. of Comput. Sci., Brigham Young Univ., Provo, UT, USA
fYear :
1999
fDate :
1999
Firstpage :
71
Lastpage :
80
Abstract :
HTML is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and in its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView
Keywords :
SQL; data structures; hypermedia markup languages; information resources; information retrieval; search engines; HTML documents; SQL; WebView; data structure; data warehousing; information retrieval; integrators; platform-independent documents; query processing; semistructured data graph; wrappers; Computer science; Data mining; Databases; HTML; Information retrieval; Pattern matching; Query processing; Read only memory; Stock markets; Warehousing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Systems for Advanced Applications, 1999. Proceedings., 6th International Conference on
Conference_Location :
Hsinchu
Print_ISBN :
0-7695-0084-6
Type :
conf
DOI :
10.1109/DASFAA.1999.765738
Filename :
765738
Link To Document :
بازگشت