DocumentCode
710095
Title
Query-time record linkage and fusion over Web databases
Author
Rezig, El Kindi ; Dragut, Eduard C. ; Ouzzani, Mourad ; Elmagarmid, Ahmed K.
Author_Institution
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
fYear
2015
fDate
13-17 April 2015
Firstpage
42
Lastpage
53
Abstract
Data-intensive Web applications usually require integrating data from Web sources at query time. The sources may refer to the same real-world entity in different ways and some may even provide outdated or erroneous data. An important task is to recognize and merge the records that refer to the same real world entity at query time. Most existing duplicate detection and fusion techniques work in the off-line setting and do not meet the online constraint. There are at least two aspects that differentiate online duplicate detection and fusion from its off-line counterpart. (i) The latter assumes that the entire data is available, while the former cannot make such an assumption. (ii) Several query submissions may be required to compute the “ideal” representation of an entity in the online setting. This paper presents a general framework for the online setting based on an iterative record-based caching technique. A set of frequently requested records is deduplicated off-line and cached for future reference. Newly arriving records in response to a query are deduplicated jointly with the records in the cache, presented to the user and appended to the cache. Experiments with real and synthetic data show the benefit of our solution over traditional record linkage techniques applied to an online setting.
Keywords
Internet; data integration; database management systems; iterative methods; query processing; sensor fusion; Web databases; Web sources; data integration; data-intensive Web applications; iterative record-based caching technique; online duplicate detection technique; online duplicate fusion technique; query submissions; query-time record fusion; query-time record linkage; real-world entity; record merging; record recognition; Accuracy; Business; Couplings; Data structures; Indexing;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Engineering (ICDE), 2015 IEEE 31st International Conference on
Conference_Location
Seoul
Type
conf
DOI
10.1109/ICDE.2015.7113271
Filename
7113271
Link To Document