DocumentCode :
1761532
Title :
Rule-Based Method for Entity Resolution
Author :
Lingli Li ; Jianzhong Li ; Hong Gao
Author_Institution :
Dept. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Volume :
27
Issue :
1
fYear :
2015
fDate :
Jan. 1 2015
Firstpage :
250
Lastpage :
263
Abstract :
The objective of entity resolution (ER) is to identify records referring to the same real-world entity. Traditional ER approaches identify records based on pairwise similarity comparisons, which assumes that records referring to the same entity are more similar to each other than otherwise. However, this assumption does not always hold in practice and similarity comparisons do not work well when such assumption breaks. We propose a new class of rules which could describe the complex matching conditions between records and entities. Based on this class of rules, we present the rule-based entity resolution problem and develop an on-line approach for ER. In this framework, by applying rules to each record, we identify which entity the record refers to. Additionally, we propose an effective and efficient rule discovery algorithm. We experimentally evaluated our rule-based ER algorithm on real data sets. The experimental results show that both our rule discovery algorithm and rule-based ER algorithm can achieve high performance.
Keywords :
data handling; knowledge based systems; ER approach; complex matching conditions; data cleaning; entity resolution; pairwise similarity comparisons; rule discovery algorithm; rule-based method; Algorithm design and analysis; Classification algorithms; Cleaning; Erbium; Semantics; Syntactics; Training data; Entity resolution; data cleaning; rule learning;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2014.2320713
Filename :
6807749
Link To Document :
بازگشت