Title :
Evaluation and automatic selection of methods for handling missing data
Author :
Zou, Ying ; An, Aijun ; Huang, Xiangji
Author_Institution :
Dept. of Comput. Sci. & Eng., York Univ., Toronto, Ont., Canada
Abstract :
Real-world data often contain missing values. In this paper, nine different methods for handling missing data are described and compared. Thirty real-world datasets are used as input data in our experiment. Nine different missing levels are implemented in each training data set to investigate the performance of the methods at different missing levels. For evaluation, two different classification techniques, C4.5 and ELEM2, are applied to obtain the classification error rates. The experimental results show that different methods can have different effects on a data set. Based on the experimental results, we also propose a meta-learning approach to selecting a method for handling missing data. The evaluation results show that the meta-learning approach is effective in selecting a suitable method for handling missing data.
Keywords :
data handling; data mining; learning (artificial intelligence); pattern classification; C4.5; ELEM2; classification error rate; classification technique; metalearning approach; missing data handling; real-world dataset; training data set; Computer science; Data engineering; Data mining; Educational institutions; Error analysis; Linear regression; Medical treatment; Robustness; Testing; Training data;
Conference_Titel :
Granular Computing, 2005 IEEE International Conference on
Print_ISBN :
0-7803-9017-2
DOI :
10.1109/GRC.2005.1547387