Title :
Research of Duplicate Record Cleaning Technology Based on a Reformative Keywords Matching Algorithm
Author :
Yan Hu ; Wei Li ; Ying Qiu ; Wei Wu
Author_Institution :
Sch. of Comput. Sci. & Technol., Wuhan Univ. of Technol. Wuhan, Wuhan
Abstract :
Based on the analysis of the insufficiencies of the present Chinese matching algorithms, by examining the characteristics of approximately duplicate records, this paper proposes a method of duplicate record cleaning based on a reformative keywords matching algorithm. Experiments show that this method improves Recall and Precision of duplicate record evidently.
Keywords :
data mining; data warehouses; pattern matching; Chinese matching algorithm; data mining; data warehouse; duplicate record cleaning technology; reformative keyword matching algorithm; Algorithm design and analysis; Cleaning; Computer science; Data analysis; Data handling; Data mining; Data warehouses; Databases; Internet;
Conference_Titel :
E-Business and Information System Security, 2009. EBISS '09. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-2909-7
Electronic_ISBN :
978-1-4244-2910-3
DOI :
10.1109/EBISS.2009.5138036