• DocumentCode
    506731
  • Title

    A property optimization method in support of approximately duplicated records detecting

  • Author

    Xiao Mansheng ; Liu Youshi ; Zhou Xiaoqi

  • Author_Institution
    Sch. of Sci., Hunan Univ. of Technol., Zhuzhou, China
  • Volume
    3
  • fYear
    2009
  • fDate
    20-22 Nov. 2009
  • Firstpage
    118
  • Lastpage
    122
  • Abstract
    In approximately duplicated records detecting of large dataset, the composition of data is complicated and the properties of data are too many, so the measurement accuracy is not high, the implementation cost is oversized. In view of these problems, a sub-fuzzy clustering property optimization method based on grouping is proposed. That is, first, the properties of group record are processed to reduce the dimension of property effectively and obtain the representation of the group, and then a similarity comparison method is used to detect approximately duplicated records in groups. It is shown in theoretical analysis and experiment, this method has higher detection accuracy and efficiency, and could better solve the recognition problems of approximately duplicated records in large dataset.
  • Keywords
    data handling; fuzzy set theory; optimisation; pattern clustering; approximately duplicated records detecting; fuzzy clustering; property optimization method; Assembly; Clustering algorithms; Clustering methods; Cost function; Data analysis; Data mining; Data warehouses; Dictionaries; Educational institutions; Optimization methods; Approximately Duplicated Records; Property Optimization; Similarity; Sub-Fuzzy Clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-4754-1
  • Electronic_ISBN
    978-1-4244-4738-1
  • Type

    conf

  • DOI
    10.1109/ICICISYS.2009.5358212
  • Filename
    5358212