Title :
Efficient missing data imputation for supervised learning
Author :
Zhang, Shichao ; Wu, Xindong ; Zhu, Manlong
Author_Institution :
Dept. of Comput. Sci., Zhejiang Normal Univ., Jinhua, China
Abstract :
In supervised learning, missing values usually appear in the training set. The missing values in a dataset may generate bias, affecting the quality of the supervised learning process or the performance of classification algorithms. These imply that a reliable method for dealing with missing values is necessary. In this paper, we analyze the difference between iterative imputation of missing values and single imputation in real-world applications. We propose an EM-style iterative imputation method, in which each missing attribute-value is iteratively filled using a predictor constructed from the known values and predicted values of the missing attribute-values from the previous iterations. Meanwhile, we demonstrate that it is reasonable to consider the imputation ordering for patching up multiple missing attribute values, and therefore introduce a method for imputation ordering. We experimentally show that our approach significantly outperforms some standard machine learning methods for handling missing values in classification tasks.
Keywords :
iterative methods; learning (artificial intelligence); pattern classification; EM-style iterative imputation method; attribute value; classification tasks; efficient missing data imputation; imputation ordering method; machine learning methods; supervised learning process; Algorithm design and analysis; Classification algorithms; Computer science; Convergence; Iterative methods; Mutual information; Prediction algorithms; Artificial intelligence; Data processing; Missing data imputation;
Conference_Titel :
Cognitive Informatics (ICCI), 2010 9th IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-8041-8
DOI :
10.1109/COGINF.2010.5599826