Title :
Data Source Selection for Large-Scale Deep Web Data Integration
Author :
Xian, Xuefeng ; Zhao, Pengpeng ; Fang, Wei ; Xin, Jie ; Cui, Zhiming
Author_Institution :
Inst. of Intell. Inf. Process. & Applic., Soochow Univ., Suzhou, China
Abstract :
Deep Web has been an important resource on the Web due to its rich and high quality information, leading to emerging a new application area in data mining and integrates. There may be hundreds or thousands of data sources providing data of relevance to a particular domain on the Web, So a primary challenge to large-scale deep Web data integration is to determine in what order to user integrate candidate data sources. In this paper, we develop a most-benefit approach (MBA) for ordering candidate data sources for user integration. At the core of this approach is a utility function that quantifies the utility of a given the state of integration system; thus, we devise a utility function for integration system based on query result number. We show in practice how to efficiently apply MBA in concert with this utility function to order data sources. A detailed experimental evaluation on real datasets shows that the ordering of data sources produced by this MBA-based yields a integration system with a significantly higher utility than a wide range of other ordering strategies.
Keywords :
Internet; data mining; query processing; MBA; data mining; data source selection; large-scale deep Web data integration system; most-benefit approach; ordering candidate data source; query result number; utility function; Application software; Costs; Crawlers; Data mining; HTML; Information processing; Information technology; Large scale integration; Rendering (computer graphics); Software engineering; data integration; data source selection; deep web; most-benefit; order data source;
Conference_Titel :
Web Mining and Web-based Application, 2009. WMWA '09. Second Pacific-Asia Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-0-7695-3646-0
DOI :
10.1109/WMWA.2009.25