Title :
Domain-oriented Deep Web Data Sources´ Discovery and Identification
Author :
Li, Yingjun ; Nie, Tiezheng ; Shen, Derong ; Yu, Ge
Author_Institution :
Inst. of Comput. Software & Theor., Northeastern Univ. Shenyang, Shenyang, China
Abstract :
As Deep Web contains tremendous well-structured data sources, how to integrate data sources in Deep Web has become a hotspot in current research. Accurately discovering and identifying Deep Web data sources related to a specific domain become key issues. We propose a Domain-Oriented Deep Web data source Discovery method (DO-DWD) and a novel Domain Identification strategy of Deep Web data sources (DIDW). In the discovery stage, we use machine learning algorithms and some heuristic rules to find query interfaces of the data sources; In the identification stage, we identify Deep Web data sources associated with the domain by calculating the relevance between a query interface and the domain based on semantic similarity. Finally, we have extensive experiments on a real data set showing that DO-DWD and DIDW are of high correctness and accuracy.
Keywords :
Internet; learning (artificial intelligence); query processing; user interfaces; data integration; data sources; domain identification strategy; domain-oriented deep Web data source discovery; machine learning algorithms; query interfaces; semantic similarity; Data engineering; Data mining; Databases; Educational institutions; Information science; Internet; Machine learning algorithms; Probes; Radio control; Software;
Conference_Titel :
Web Conference (APWEB), 2010 12th International Asia-Pacific
Conference_Location :
Busan
Print_ISBN :
978-1-7695-4012-2
Electronic_ISBN :
978-1-4244-6600-9
DOI :
10.1109/APWeb.2010.54