Title :
An Algorithm for Detecting Similar Data in Replicated Databases Using Multi Criteria Decision Making
Author :
Sorkhabi, Vahideh Baradaran ; Derakhshi, M.-R.F. ; Shahamfar, Hadi
Author_Institution :
Dept. of Comput. Eng., Azad Univ. Shabestar Branch, Tabriz, Iran
Abstract :
Identical data may cause many problems in all types of databases, specially distributed and replicated databases. These data will attack consistency and redundancy which are two important problems in databases. Databases or replicas may contain similar records with different appearance, concerning the same real word entity because of many reasons. Some of these reasons are: Entry errors, unstandardized abbreviations, differences details of various databases schemas, package lost, noisy environments and etc are some reasons of duplicates. This paper proposes an approach to detect duplicate or similar data, which are faulty or noisy so they are distinguished as different data, among various replicas in distributed or replicated databases. Multi criteria decision making algorithm is employed for this propose. To detect identical records, at first step some priorities are defined for fields and then percent of similarity of records evaluate. Algorithm´s time overhead is improved through using special order of priorities. Multi criteria decision making algorithm is used to decide how to combine records with each other and which record is complete and true one. An instance based learning approach is employed to learn how to set priorities for various fields, creating a uniform schema and find their appropriate match, in other replica.
Keywords :
database management systems; decision making; operations research; databases schemas; instance based learning approach; multicriteria decision making algorithm; similar data detection; Computer science; Data engineering; Decision making; Delay; Distributed computing; Distributed databases; Mathematics; Redundancy; Scalability; Working environment noise; Replicated database; distributed database; instance based learning; similar data;
Conference_Titel :
Environmental and Computer Science, 2009. ICECS '09. Second International Conference on
Conference_Location :
Dubai
Print_ISBN :
978-0-7695-3937-9
Electronic_ISBN :
978-1-4244-5591-1
DOI :
10.1109/ICECS.2009.71