DocumentCode :
3464512
Title :
Data mining for removing fuzzy duplicates using fuzzy inference
Author :
Shahri, Hamid Haidarian ; Barforush, Ahmad Abdollah Zadeh
Author_Institution :
Dept. of Comput. Eng. & IT, Amirkabir Univ. of Technol., Tehran, Iran
Volume :
1
fYear :
2004
fDate :
27-30 June 2004
Firstpage :
419
Abstract :
Data cleaning deals with the detection and elimination of inconsistencies in data, gathered from distributed sources. This process is essential for drawing correct conclusions from data, in decision support systems. Mining the data for the removal of fuzzy duplicate records is a challenging part the cleaning process. The vagueness and uncertainty involved in the detection of fuzzy duplicates make it a niche for applying fuzzy reasoning. Although, uncertainty algebras like fuzzy logic are known, their applicability to the problem of duplicate elimination has not been explored. In this paper, a practical and novel duplicate elimination system is presented, which exploits a fuzzy inference engine for handling the uncertainty involved in detecting fuzzy duplicates. The innovation of the system is in capturing expert´s knowledge, in the form of natural language fuzzy rules and using these simple rules to efficiently clean the data. This in turn, reduces the time required for the repetitive and time consuming task of hard-coding, for de-duplication based on a schema for each database.
Keywords :
data mining; data reduction; decision support systems; fuzzy logic; fuzzy set theory; fuzzy systems; inference mechanisms; natural languages; uncertainty handling; data cleaning process; data mining; decision support systems; deduplication; experts knowledge; fuzzy duplicate detection; fuzzy duplicate elimination system; fuzzy duplicate record removal; fuzzy inference engine; fuzzy logic; fuzzy reasoning; hard coding; natural language fuzzy rules; time reduction; uncertainty algebra; uncertainty handling; vagueness; Algebra; Cleaning; Data mining; Decision support systems; Engines; Fuzzy logic; Fuzzy reasoning; Fuzzy systems; Technological innovation; Uncertainty;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Information, 2004. Processing NAFIPS '04. IEEE Annual Meeting of the
Print_ISBN :
0-7803-8376-1
Type :
conf
DOI :
10.1109/NAFIPS.2004.1336319
Filename :
1336319
Link To Document :
بازگشت