DocumentCode :
506731
Title :
A property optimization method in support of approximately duplicated records detecting
Author :
Xiao Mansheng ; Liu Youshi ; Zhou Xiaoqi
Author_Institution :
Sch. of Sci., Hunan Univ. of Technol., Zhuzhou, China
Volume :
3
fYear :
2009
fDate :
20-22 Nov. 2009
Firstpage :
118
Lastpage :
122
Abstract :
In approximately duplicated records detecting of large dataset, the composition of data is complicated and the properties of data are too many, so the measurement accuracy is not high, the implementation cost is oversized. In view of these problems, a sub-fuzzy clustering property optimization method based on grouping is proposed. That is, first, the properties of group record are processed to reduce the dimension of property effectively and obtain the representation of the group, and then a similarity comparison method is used to detect approximately duplicated records in groups. It is shown in theoretical analysis and experiment, this method has higher detection accuracy and efficiency, and could better solve the recognition problems of approximately duplicated records in large dataset.
Keywords :
data handling; fuzzy set theory; optimisation; pattern clustering; approximately duplicated records detecting; fuzzy clustering; property optimization method; Assembly; Clustering algorithms; Clustering methods; Cost function; Data analysis; Data mining; Data warehouses; Dictionaries; Educational institutions; Optimization methods; Approximately Duplicated Records; Property Optimization; Similarity; Sub-Fuzzy Clustering;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-4754-1
Electronic_ISBN :
978-1-4244-4738-1
Type :
conf
DOI :
10.1109/ICICISYS.2009.5358212
Filename :
5358212
Link To Document :
بازگشت