DocumentCode
3491951
Title
Beyond probabilistic record linkage: Using neural networks and complex features to improve genealogical record linkage
Author
Wilson, D. Randall
Author_Institution
FamilySearch, Salt Lake City, UT, USA
fYear
2011
fDate
July 31 2011-Aug. 5 2011
Firstpage
9
Lastpage
14
Abstract
Probabilistic record linkage has been used for many years in a variety of industries, including medical, government, private sector and research groups. The formulas used for probabilistic record linkage have been recognized by some as being equivalent to the naïve Bayes classifier. While this method can produce useful results, it is not difficult to improve accuracy by using one of a host of other machine learning or neural network algorithms. Even a simple single-layer perceptron tends to outperform the naïve Bayes classifier-and thus traditional probabilistic record linkage methods-by a substantial margin. Furthermore, many record linkage system use simple field comparisons rather than more complex features, partially due to the limits of the probabilistic formulas they use. This paper presents an overview of probabilistic record linkage, shows how to cast it in machine learning terms, and then shows that it is equivalent to a naïve Bayes classifier. It then discusses how to use more complex features than simple field comparisons, and shows how probabilistic record linkage formulas can be modified to handle this. Finally, it demonstrates a huge improvement in accuracy through the use of neural networks and higher-level matching features, compared to traditional probabilistic record linkage on a large (80,000 pair) set of labeled pairs of genealogical records used by FamilySearch.org.
Keywords
Bayes methods; learning (artificial intelligence); pattern matching; perceptrons; records management; genealogical record linkage method; high-level matching features; machine learning; naive Bayes classifier; neural network algorithms; probabilistic record linkage method; single-layer perceptron; Accuracy; Classification algorithms; Couplings; Fires; Neural networks; Probabilistic logic; Training;
fLanguage
English
Publisher
ieee
Conference_Titel
Neural Networks (IJCNN), The 2011 International Joint Conference on
Conference_Location
San Jose, CA
ISSN
2161-4393
Print_ISBN
978-1-4244-9635-8
Type
conf
DOI
10.1109/IJCNN.2011.6033192
Filename
6033192
Link To Document