DocumentCode :
1791673
Title :
Calculating feature importance in data streams with concept drift using Online Random Forest
Author :
Cassidy, Andrew Phelps ; Deviney, Frank A.
Author_Institution :
Commonwealth Comput. Res. Inc. (CCRi), Charlottesville, VA, USA
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
23
Lastpage :
28
Abstract :
Large volume data streams with concept drift have garnered a great deal of attention in the machine learning community. Numerous researchers have proposed online learning algorithms that train iteratively from new observations, and provide continuously relevant predictions. Compared to previous offline, or sliding window approaches, these algorithms have shown better predictive performance, rapid detection of, and adaptation to, concept drift, and increased scalability to high volume or high velocity data. Online Random Forest (ORF) is one such approach to streaming classification problems. We adapted the feature importance metrics of Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini Impurity (MDG), both originally designed for offline Random Forest, to Online Random Forest so that they evolve with time and concept drift. Our work is novel in that previous streaming models have not provided any measures of feature importance. We experimentally tested our Online Random Forest versions of feature importance against their offline counterparts, and concluded that our approach to tracking the underlying drifting concepts in a simulated data stream is valid.
Keywords :
Big Data; learning (artificial intelligence); statistical analysis; very large databases; MDA; MDG; ORF; concept drift; feature importance metrics; high velocity data; high volume data; large volume data streams; machine learning community; mean decrease in Gini impurity; mean decrease in accuracy; offline random forest; online learning algorithms; online random forest; streaming models; Accuracy; Adaptation models; Impurities; Measurement; Prediction algorithms; Training; Vegetation; Concept Drift; Data Streams; Feature Importance; Online Random Forest;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004352
Filename :
7004352
Link To Document :
بازگشت