• Title of article

    MFIBlocks: An effective blocking algorithm for entity resolution

  • Author/Authors

    Batya Kenig، نويسنده , , Avigdor Gal، نويسنده ,

  • Issue Information
    روزنامه با شماره پیاپی سال 2013
  • Pages
    19
  • From page
    908
  • To page
    926
  • Abstract
    Entity resolution is the process of discovering groups of tuples that correspond to the same real-world entity. Blocking algorithms separate tuples into blocks that are likely to contain matching pairs. Tuning is a major challenge in the blocking process and in particular, high expertise is needed in contemporary blocking algorithms to construct a blocking key, based on which tuples are assigned to blocks. In this work, we introduce a blocking approach that avoids selecting a blocking key altogether, relieving the user from this difficult task. The approach is based on maximal frequent itemsets selection, allowing early evaluation of block quality based on the overall commonality of its members. A unique feature of the proposed algorithm is the use of prior knowledge of the estimated size of duplicate sets in enhancing the blocking accuracy. We report on a thorough empirical analysis, using common benchmarks of both real-world and synthetic datasets to exhibit the effectiveness and efficiency of our approach.
  • Keywords
    Entity resolution , blocking
  • Journal title
    Information Systems
  • Serial Year
    2013
  • Journal title
    Information Systems
  • Record number

    1230336