DocumentCode :
3576301
Title :
ERGP: A Combined Entity Resolution Approach with Genetic Programming
Author :
Chenchen Sun ; Derong Shen ; Yue Kou ; Tiezheng Nie ; Ge Yu
Author_Institution :
Inst. of Comput. Software, Northeastern Univ., Shenyang, China
fYear :
2014
Firstpage :
215
Lastpage :
220
Abstract :
Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties´ comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes´ pair, leaving no burden of setting thresholds to the user.
Keywords :
data integration; genetic algorithms; pattern classification; ERGP; attribute pair; combined entity resolution approach with genetic programming; data cleaning; data integration; data sources; effective entity resolution classifier; Classification algorithms; Erbium; Genetic programming; Sociology; Statistics; Training data; Entity resolution; data integration; genetic programming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information System and Application Conference (WISA), 2014 11th
Print_ISBN :
978-1-4799-5726-2
Type :
conf
DOI :
10.1109/WISA.2014.46
Filename :
7058015
Link To Document :
بازگشت