Title :
Tree-Based Approach to Missing Data Imputation
Author :
Vateekul, Peerapon ; Sarinnapakorn, Kanoksri
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Miami, Coral Gables, FL, USA
Abstract :
Missing data is a well-recognized issue in data mining, and imputation is one way to handle the problem. In this paper, we propose a novel tree-based imputation algorithm called ¿imputation tree¿ (ITree). It first studies the predictability of missingness using all observations by constructing a binary classification tree called ¿missing pattern tree¿ (MPT). Then, missing values in each cluster or terminal node are estimated by a regression tree of observations at that node. We present empirical results using both synthetic and real data. Almost all experiments demonstrate that ITree is superior to other commonly used methods in estimating missing values. The algorithm not only produces an impressive accuracy, but also provides information on the nature of missingness.
Keywords :
data mining; trees (mathematics); binary classification tree; data mining; imputation tree; missing data imputation; missing pattern tree; regression tree; tree-based imputation algorithm; Classification tree analysis; Clustering algorithms; Conferences; Data handling; Data mining; Decision trees; Peer to peer computing; Regression tree analysis; Testing; USA Councils;
Conference_Titel :
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location :
Miami, FL
Print_ISBN :
978-1-4244-5384-9
Electronic_ISBN :
978-0-7695-3902-7
DOI :
10.1109/ICDMW.2009.92