• DocumentCode
    479779
  • Title

    A Mutual-Information-Based Approach to Entity Reconciliation in Heterogeneous Databases

  • Author

    Bao-hua Qiang ; Xi, Jian-qing ; Bao-hua Qiang

  • Author_Institution
    Sch. of Comput. Sci. & Eng., South China Univ. of Technol., Guangzhou
  • Volume
    1
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    666
  • Lastpage
    669
  • Abstract
    Entity reconciliation is crucial to data interoperability in heterogeneous databases. In our previous research works, we proposed an entities matching algorithm based on attribute entropy to identify the corresponding entities, which can resolve the limitations of present main approaches and improve the precision of entities matching obviously. By our further research, we find that some attributes with different importance in identifying the entities will obtain the same weights just according to attribute entropy. So in this paper we employ mutual information to quantify attribute weight due to mutual information well describes the correlation of probability distributions over two attributes. According to this idea, the final entropy computation algorithm and entity reconciliation algorithm based on mutual information are presented. The experimental results on real-world data show that our mutual-information-based approach can obtain better performance.
  • Keywords
    data handling; distributed databases; entropy; open systems; attribute entropy; data interoperability; entities matching algorithm; entity reconciliation; heterogeneous databases; mutual-information-based approach; Computer science; Data engineering; Databases; Distributed computing; Educational institutions; Entropy; Information science; Mutual information; Probability distribution; Software engineering; attribute entropy; entities matching; heterogeneous databases; mutual information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.535
  • Filename
    4721837