Title :
A new approach for data cleaning process
Author :
Krishnamoorthy, R. ; Kumar, Sahoo Subhendu ; Neelagund, Basavaraj
Author_Institution :
Dept. of CSE, Anna Univ., Chennai, India
Abstract :
In this paper, we introduced a new approach called Effective Data Cleaning (EDC) is presented. The proposed EDC technique is aimed to identify the relevant and irrelevant instance from the large data set through the degree of the missing value, and it reconstructs the missed value in relevant instance through its closest instance within the instance set. The EDC technique is consist of two methods Identify Relevant Instance (IRI) and Reconstruct Missing Value (RMV). The IRI method is identifying the relevant and irrelevant instance belongs to the large instance set through the degree of the missing value of each instance in the instance set, and the RMV method can reconstruct the missing value in the relevant instance through its closest instance based on the distance metric. Experiment result shows, that the proposed EDC technique is simple and effective for identifying the relevant and irrelevant instance, and reconstruct the missing values in the relevant instance through the closest instance with higher similarity.
Keywords :
data handling; EDC technique; IRI method; RMV method; closest instance; distance metric; effective data cleaning process; identify relevant instance; instance set; irrelevant instance; reconstruct missing value; Data mining; Measurement; Effective Data Cleaning (EDC); Identify Relevant Instance (IRI) and Reconstruct Missing Value (RMV);
Conference_Titel :
Recent Advances and Innovations in Engineering (ICRAIE), 2014
Conference_Location :
Jaipur
Print_ISBN :
978-1-4799-4041-7
DOI :
10.1109/ICRAIE.2014.6909249