DocumentCode :
1797324
Title :
Bias-corrected Quantile Regression Forests for high-dimensional data
Author :
Nguyen Thanh Tung ; Huang, Joshua Zhexue ; Thuy Thi Nguyen ; Khan, Imran
Author_Institution :
Shenzhen Key Lab. of High Performance Data Min., SIAT, Shenzhen, China
Volume :
1
fYear :
2014
fDate :
13-16 July 2014
Firstpage :
1
Lastpage :
6
Abstract :
The Quantile Regression Forest (QRF), a nonparametric regression method based on the random forests, has been proved to perform well in terms of prediction accuracy, especially for non-Gaussian conditional distributions. However, the method may have two kinds of bias when solving regression problems: bias in the feature selection stage and bias in solving the regression problem. In this paper, we propose a new bias-correction algorithm that uses bias correction based on the QRF. To correct the first kind of bias, we propose a new scheme for feature sampling that allows to select good features for growing trees. The first level QRF is built based on this. For the second kind of bias, the residual term of the first level QRF model is used as the response feature to train the second level QRF model for bias correction. The second level model is then used to compute bias-corrected predictions. In our experiments, the proposed algorithm dramatically reduces prediction errors and outperforms most of the existing regression random forests models for both synthetic and well-known real-world data sets.
Keywords :
data mining; feature selection; nonparametric statistics; random processes; regression analysis; sampling methods; trees (mathematics); QRF model; bias-corrected prediction; bias-corrected quantile regression forests; bias-correction algorithm; data mining; feature sampling; feature selection stage; growing trees; high-dimensional data; nonGaussian conditional distribution; nonparametric regression method; prediction accuracy; prediction error; real-world data set; regression problem; regression random forests model; response feature; synthetic data set; Abstracts; Breast; Electronic mail; Predictive models; Radio frequency; Rivers; Servomotors; Bias Correction; Data mining; High-Dimensional Data; Quantile Regression Forests; Random Forests;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics (ICMLC), 2014 International Conference on
Conference_Location :
Lanzhou
ISSN :
2160-133X
Print_ISBN :
978-1-4799-4216-9
Type :
conf
DOI :
10.1109/ICMLC.2014.7009082
Filename :
7009082
Link To Document :
بازگشت