Title of article :
Adaptive Approximate Record Matching
Author/Authors :
Rahnamoun, Ramin Computer Engineering Department - Tehran Central Branch - Azad University, Tehran
Pages :
5
From page :
23
To page :
27
Abstract :
Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error patterns. In field matching phase, edit distance method is used. Naturally, it had been customized for Persian language problems such as similarity of Persian characters, usual typographical errors in Persian, etc. In record matching phase, the importance of each field can be determined by specifying a coefficient related to each field. Coefficient of each field must be dynamically changed, because of changes of typographical error patterns. For this reason, Genetic Algorithm (GA) is used for supervised learning of coefficient values. The simulation results show the high abilities of this algorithm compared with other methods (such as Decision Trees).
Keywords :
record matching , edit distance , data cleaning , genetic algorithms
Journal title :
Astroparticle Physics
Serial Year :
2014
Record number :
2483188
Link To Document :
بازگشت