DocumentCode
506731
Title
A property optimization method in support of approximately duplicated records detecting
Author
Xiao Mansheng ; Liu Youshi ; Zhou Xiaoqi
Author_Institution
Sch. of Sci., Hunan Univ. of Technol., Zhuzhou, China
Volume
3
fYear
2009
fDate
20-22 Nov. 2009
Firstpage
118
Lastpage
122
Abstract
In approximately duplicated records detecting of large dataset, the composition of data is complicated and the properties of data are too many, so the measurement accuracy is not high, the implementation cost is oversized. In view of these problems, a sub-fuzzy clustering property optimization method based on grouping is proposed. That is, first, the properties of group record are processed to reduce the dimension of property effectively and obtain the representation of the group, and then a similarity comparison method is used to detect approximately duplicated records in groups. It is shown in theoretical analysis and experiment, this method has higher detection accuracy and efficiency, and could better solve the recognition problems of approximately duplicated records in large dataset.
Keywords
data handling; fuzzy set theory; optimisation; pattern clustering; approximately duplicated records detecting; fuzzy clustering; property optimization method; Assembly; Clustering algorithms; Clustering methods; Cost function; Data analysis; Data mining; Data warehouses; Dictionaries; Educational institutions; Optimization methods; Approximately Duplicated Records; Property Optimization; Similarity; Sub-Fuzzy Clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-4244-4754-1
Electronic_ISBN
978-1-4244-4738-1
Type
conf
DOI
10.1109/ICICISYS.2009.5358212
Filename
5358212
Link To Document