Title :
CAIRAD: A co-appearance based analysis for Incorrect Records and Attribute-values Detection
Author :
Rahman, Md Geaur ; Islam, Md Zahidul ; Bossomaier, Terry ; Gao, Junbin
Author_Institution :
Centre for Res. in Complex Syst., Charles Sturt Univ., Bathurst, NSW, Australia
Abstract :
Data pre-processing and cleansing play a vital role in data mining for ensuring good quality of data. Data cleansing tasks include imputation of missing values, and identification and correction of incorrect/noisy data. In this paper, we present a novel approach called Co-appearance based Analysis for Incorrect Records and Attribute-values Detection (CAIRAD). For a data set having incorrect/noisy values CAIRAD separates the noisy records from the clean records. It thereby produces two data sets; a clean data set and a data set having all noisy records. It also reports noisy attribute values of each noisy record. We evaluate CAIRAD on four publicly available natural data sets by comparing its performance with the performance of two high quality existing techniques namely RDCL and EDIR. We use various patterns (of noisy values) each having different noise levels. Several evaluation criteria such as error recall (ER), error precision (EP), F-measure, record removal ratio (rRR), and area under a receiver operating characteristics curve (AUC) are used. Our experimental results indicate that CAIRAD performs significantly better (based on t-test analysis) than RDCL and EDIR.
Keywords :
data handling; data mining; sensitivity analysis; CAIRAD; EDIR; RDCL; clean data set; coappearance-based analysis for incorrect records and attribute-values detection; data cleansing; data mining; data preprocessing; high quality existing techniques; missing values; natural data sets; noisy attribute value detection; noisy records; receiver operating characteristics curve; t-test analysis; Computer aided manufacturing; Data mining; Noise; Noise measurement; Remuneration; Testing; Training data; Data Mining; Data cleansing; Data pre-processing; Noise Detection;
Conference_Titel :
Neural Networks (IJCNN), The 2012 International Joint Conference on
Conference_Location :
Brisbane, QLD
Print_ISBN :
978-1-4673-1488-6
Electronic_ISBN :
2161-4393
DOI :
10.1109/IJCNN.2012.6252669