Title :
A confidence-based entity resolution approach with incomplete information
Author :
Qi Gu ; Yan Zhang ; Jian Cao ; Guandong Xu ; Cuzzocrea, Alfredo
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
Entity resolution identifies entities from different data sources that refer to the same real-world entity and it is an important prerequisite for integrating data from multiple sources. Entity resolution mainly relies on similarity measures on data records. Unfortunately, the data quality of data sources is not so good in practice. Especially web data sources often only provide incomplete information, which leads to the difficulties of direct applying similarity measures to identify the same entities. In order to address this problem, the concept of confidence is introduced to measure the trustworthy of the similarity calculation. An adaptive rule-based approach is used to calculate the similarity between records and its confidence is also derived. Then the similarity and confidence are propagated on the entity relational graph until fix point is reached. Finally, any pair of two records can be determined as matched or unmatched based on a threshold. We performed a series of experiments on real data sets and experiment results show that our approach has a better performance comparing with others.
Keywords :
Internet; graph theory; knowledge based systems; trusted computing; Web data source; adaptive rule-based approach; confidence-based entity resolution approach; data quality; data record; real-world entity; relational graph; similarity calculation; similarity measure; trustworthy; Accuracy; Frequency modulation; Training; Vectors; Accuracy; Confidence; Cover Rate; Entity Resolution;
Conference_Titel :
Data Science and Advanced Analytics (DSAA), 2014 International Conference on
DOI :
10.1109/DSAA.2014.7058058