DocumentCode :
3087424
Title :
An Efficient and Effective Duplication Detection Method in Large Database Applications
Author :
Zhang, Ji
Author_Institution :
Dept. of Math. & Comput., Univ. of Southern Queensland, Toowoomba, QLD, Australia
fYear :
2010
fDate :
1-3 Sept. 2010
Firstpage :
494
Lastpage :
501
Abstract :
In this paper, we developed a robust data cleaning technique, called PC-Filter+ (PC stands for partition comparison) based on its predecessor, for effective and efficient duplicate record detection in large databases. PC-Filter+ provides more flexible algorithmic options for constructing the Partition Comparison Graph (PCG). In addition, PC-Filter+ is able to deal with duplicate detection under different memory constraints.
Keywords :
data mining; data warehouses; PC-Filter+; duplication detection method; large database applications; partition comparison graph; robust data cleaning technique; Cleaning; Clustering algorithms; Complexity theory; Databases; Measurement; Partitioning algorithms; Sorting;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Network and System Security (NSS), 2010 4th International Conference on
Conference_Location :
Melbourne, VIC
Print_ISBN :
978-1-4244-8484-3
Electronic_ISBN :
978-0-7695-4159-4
Type :
conf
DOI :
10.1109/NSS.2010.78
Filename :
5635844
Link To Document :
بازگشت