Title :
Towards building a MetaQuerier: extracting and matching Web query interfaces
Author :
B. He;Z. Zhang;K.C.-C. Chang
Author_Institution :
Dept. of Comput. Sci., Illinois Univ., Urbana, IL, USA
fDate :
6/27/1905 12:00:00 AM
Abstract :
We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, "data mining for information integration." That is, we mine the observable information to discover the underlying semantics.
Keywords :
"Data mining","Databases","Books","Web pages","HTML","Helium","Computer science","Uniform resource locators","Large-scale systems","Large scale integration"
Conference_Titel :
Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on
Print_ISBN :
0-7695-2285-8
DOI :
10.1109/ICDE.2005.145