• DocumentCode
    3491951
  • Title

    Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage

  • Author

    Wilson, D. Randall

  • Author_Institution
    FamilySearch, Salt Lake City, UT, USA
  • fYear
    2011
  • fDate
    July 31 2011-Aug. 5 2011
  • Firstpage
    9
  • Lastpage
    14
  • Abstract
    Probabilistic record linkage has been used for many years in a variety of industries, including medical, government, private sector and research groups. The formulas used for probabilistic record linkage have been recognized by some as being equivalent to the naïve Bayes classifier. While this method can produce useful results, it is not difficult to improve accuracy by using one of a host of other machine learning or neural network algorithms. Even a simple single-layer perceptron tends to outperform the naïve Bayes classifier-and thus traditional probabilistic record linkage methods-by a substantial margin. Furthermore, many record linkage system use simple field comparisons rather than more complex features, partially due to the limits of the probabilistic formulas they use. This paper presents an overview of probabilistic record linkage, shows how to cast it in machine learning terms, and then shows that it is equivalent to a naïve Bayes classifier. It then discusses how to use more complex features than simple field comparisons, and shows how probabilistic record linkage formulas can be modified to handle this. Finally, it demonstrates a huge improvement in accuracy through the use of neural networks and higher-level matching features, compared to traditional probabilistic record linkage on a large (80,000 pair) set of labeled pairs of genealogical records used by FamilySearch.org.
  • Keywords
    Bayes methods; learning (artificial intelligence); pattern matching; perceptrons; records management; genealogical record linkage method; high-level matching features; machine learning; naive Bayes classifier; neural network algorithms; probabilistic record linkage method; single-layer perceptron; Accuracy; Classification algorithms; Couplings; Fires; Neural networks; Probabilistic logic; Training;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks (IJCNN), The 2011 International Joint Conference on
  • Conference_Location
    San Jose, CA
  • ISSN
    2161-4393
  • Print_ISBN
    978-1-4244-9635-8
  • Type

    conf

  • DOI
    10.1109/IJCNN.2011.6033192
  • Filename
    6033192