• DocumentCode
    657982
  • Title

    Similar data elimination: MFB algorithm

  • Author

    Boufares, Faouzi ; Ben Salem, Aicha ; Rehab, Moufida ; Correia, Sebastiao

  • Author_Institution
    Lab. LIPN, Univ. Paris 13, Villetaneuse, France
  • fYear
    2013
  • fDate
    6-8 May 2013
  • Firstpage
    289
  • Lastpage
    293
  • Abstract
    Nowadays, the complex applications such as knowledge extraction, data mining, E-learning and web applications use heterogeneous and distributed data. In this context, the need for integration and improving data quality is increasingly felt. The problem of eliminating duplicates and similar data is still relevant in terms of both performance and in terms of the definition of similarity rules. We present in this paper a new deduplication algorithm based on the two functions Match and Merge. An evaluation is made experimentally using a set of randomly generated data.
  • Keywords
    data analysis; data integration; data mining; merging; MFB algorithm; Match function; Merge function; Web applications; data integration; data mining; data quality improvement; deduplication algorithm; distributed data; duplicate elimination; e-learning; heterogeneous data; knowledge extraction; similar data elimination; similarity rules; Cleaning; Companies; Couplings; Data mining; Knowledge discovery; Semantics; Switches; Data Quality; Deduplication; Duplicates; Match; Merge; Similar Data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Control, Decision and Information Technologies (CoDIT), 2013 International Conference on
  • Conference_Location
    Hammamet
  • Print_ISBN
    978-1-4673-5547-6
  • Type

    conf

  • DOI
    10.1109/CoDIT.2013.6689559
  • Filename
    6689559