• DocumentCode
    1299519
  • Title

    Efficient Techniques for Online Record Linkage

  • Author

    Dey, Debabrata ; Mookerjee, Vijay S. ; Liu, Dengpan

  • Author_Institution
    Foster Sch. of Bus., Univ. of Washington, Seattle, WA, USA
  • Volume
    23
  • Issue
    3
  • fYear
    2011
  • fDate
    3/1/2011 12:00:00 AM
  • Firstpage
    373
  • Lastpage
    387
  • Abstract
    The need to consolidate the information contained in heterogeneous data sources has been widely documented in recent years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems, especially the entity heterogeneity problem that arises when the same real-world entity type is represented using different identifiers in different data sources. Statistical record linkage techniques could be used for resolving this problem. However, the use of such techniques for online record linkage could pose a tremendous communication bottleneck in a distributed environment (where entity heterogeneity problems are often encountered). In order to resolve this issue, we develop a matching tree, similar to a decision tree, and use it to propose techniques that reduce the communication overhead significantly, while providing matching decisions that are guaranteed to be the same as those obtained using the conventional linkage technique. These techniques have been implemented, and experiments with real-world and synthetic databases show significant reduction in communication overhead.
  • Keywords
    Internet; distributed databases; communication bottleneck; data sources; distributed environment; online record linkage; statistical record linkage techniques; synthetic databases; Companies; Couplings; Distributed databases; Insurance; Standards organizations; Record linkage; data heterogeneity.; decision tree; entity matching; sequential decision making;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2010.134
  • Filename
    5551133