• DocumentCode
    3576301
  • Title

    ERGP: A Combined Entity Resolution Approach with Genetic Programming

  • Author

    Chenchen Sun ; Derong Shen ; Yue Kou ; Tiezheng Nie ; Ge Yu

  • Author_Institution
    Inst. of Comput. Software, Northeastern Univ., Shenyang, China
  • fYear
    2014
  • Firstpage
    215
  • Lastpage
    220
  • Abstract
    Entities often hold more than one representation with some expressive errors in different data sources in the real world. Different representations and a few possible expressive errors make entities identifying a crucial task in data integration and data cleaning, which is known as entity resolution. We propose a novel approach for entity resolution using genetic programming named Entity Resolution with Genetic Programming (ERGP). ERGP is able to learn to get an effective entity resolution classifier by combining several different properties´ comparisons. The evaluation shows that ERGP outperforms the state-of-the-art entity resolution algorithms. Above all the ERGP approach is capable of setting the threshold for each single comparison of an attributes´ pair, leaving no burden of setting thresholds to the user.
  • Keywords
    data integration; genetic algorithms; pattern classification; ERGP; attribute pair; combined entity resolution approach with genetic programming; data cleaning; data integration; data sources; effective entity resolution classifier; Classification algorithms; Erbium; Genetic programming; Sociology; Statistics; Training data; Entity resolution; data integration; genetic programming;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information System and Application Conference (WISA), 2014 11th
  • Print_ISBN
    978-1-4799-5726-2
  • Type

    conf

  • DOI
    10.1109/WISA.2014.46
  • Filename
    7058015