• DocumentCode
    3776608
  • Title

    Proof of duplication detection in data by applying similarity strategies

  • Author

    Varsha Wandhekar;Arti Mohanpurkar

  • Author_Institution
    Dr. D.Y. Patil School of Engg &Tech, Savitribai Phule University of Pune, MH, India
  • fYear
    2015
  • Firstpage
    429
  • Lastpage
    434
  • Abstract
    De-duplication is the process of determining all categories of information within a data set that signify the same real life / world entity. The data gathered from various resources may have quality issues in it. The concept to identify duplicates by using windowing and blocking strategy. The objective is to achieve better precision, good efficiency and also to reduce the false positive rate all are in accordance with the estimated similarities of records. Various Similarity metrics are commonly used to recognize the similar field entries. So the main focus of this paper is to applying appropriate similarity measure on appropriate data to properly identifying the duplicates. De-duplication is a property which provides additional information of similarities between the two entities. Thus, in today´s data centric environment there are huge numbers of defects in similarity measure. As a result to identify the duplicates is always been a challenging task. In this paper the primary focus is given on exact identification of duplicates in the database by applying concept of windowing & blocking. The objective is to achieve better precision, good efficiency and also to reduce the false positive rate all are in accordance with the estimated similarities of records.
  • Keywords
    "Databases","Standardization","Measurement","Sorting","Detection algorithms","Algorithm design and analysis","Cleaning"
  • Publisher
    ieee
  • Conference_Titel
    Information Processing (ICIP), 2015 International Conference on
  • Type

    conf

  • DOI
    10.1109/INFOP.2015.7489421
  • Filename
    7489421