Title :
An Efficient and Effective Duplication Detection Method in Large Database Applications
Author_Institution :
Dept. of Math. & Comput., Univ. of Southern Queensland, Toowoomba, QLD, Australia
Abstract :
In this paper, we developed a robust data cleaning technique, called PC-Filter+ (PC stands for partition comparison) based on its predecessor, for effective and efficient duplicate record detection in large databases. PC-Filter+ provides more flexible algorithmic options for constructing the Partition Comparison Graph (PCG). In addition, PC-Filter+ is able to deal with duplicate detection under different memory constraints.
Keywords :
data mining; data warehouses; PC-Filter+; duplication detection method; large database applications; partition comparison graph; robust data cleaning technique; Cleaning; Clustering algorithms; Complexity theory; Databases; Measurement; Partitioning algorithms; Sorting;
Conference_Titel :
Network and System Security (NSS), 2010 4th International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4244-8484-3
Electronic_ISBN :
978-0-7695-4159-4
DOI :
10.1109/NSS.2010.78