DocumentCode :
3618181
Title :
Towards building a MetaQuerier: extracting and matching Web query interfaces
Author :
B. He;Z. Zhang;K.C.-C. Chang
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
fYear :
2005
fDate :
6/27/1905 12:00:00 AM
Firstpage :
1098
Lastpage :
1099
Abstract :
We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, "data mining for information integration." That is, we mine the observable information to discover the underlying semantics.
Keywords :
"Data mining","Databases","Books","Web pages","HTML","Helium","Computer science","Uniform resource locators","Large-scale systems","Large scale integration"
Publisher :
ieee
Conference_Titel :
Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
ISSN :
1084-4627
Print_ISBN :
0-7695-2285-8
Type :
conf
DOI :
10.1109/ICDE.2005.145
Filename :
1410219
Link To Document :
بازگشت