DocumentCode :
3114414
Title :
Clustering-based Missing Value Imputation for Data Preprocessing
Author :
Zhang, Chengqi ; Qin, Yongsong ; Zhu, Xiaofeng ; Zhang, Jilian ; Zhang, Shichao
Author_Institution :
Fac. of Inf. Technol., Univ. of Technol. Sydney, Broadway, NSW
fYear :
2006
fDate :
16-18 Aug. 2006
Firstpage :
1081
Lastpage :
1086
Abstract :
Missing value imputation is an actual yet challenging issue confronted by machine learning and data mining. Existing missing value imputation is a procedure that replaces the missing values in a dataset by some plausible values. The plausible values are generally generated from the dataset using a deterministic, or random method. In this paper we propose a new and efficient missing value imputation based on data clustering, called CRI (clustering-based random imputation). In our approach, we fill up the missing values of an instance with those plausible values that are generated from the data similar to this instance using a kernel-based random method. Specifically, we first divide the dataset (exclude instances with missing values) into clusters. And then each of those instances with missing-values is assigned to a cluster most similar to it. Finally, missing values of an instance A are thus patched up with those plausible values that are generated using a kernel-based method to those instances from A´s cluster. Our experiments (some of them are with the decision tree induction system C 5.0) have proved the effectiveness of our proposed method in missing value imputation task.
Keywords :
data mining; learning (artificial intelligence); pattern clustering; random processes; clustering-based random imputation; data clustering; data mining; data preprocessing; kernel-based random method; machine learning; missing value imputation; Australia; Computer science; Data mining; Data preprocessing; Decision trees; Induction generators; Information technology; Machine learning; Nearest neighbor searches; Stochastic processes;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Industrial Informatics, 2006 IEEE International Conference on
Conference_Location :
Singapore
Print_ISBN :
0-7803-9700-2
Electronic_ISBN :
0-7803-9701-0
Type :
conf
DOI :
10.1109/INDIN.2006.275767
Filename :
4053540
Link To Document :
بازگشت