Title :
Red-RF: Reduced Random Forest for Big Data Using Priority Voting & Dynamic Data Reduction
Author :
Mohsen, Hussein ; Kurban, Hasan ; Zimmer, Kurt ; Jenne, Mark ; Dalkilic, Mehmet M.
Author_Institution :
Dept. of Comput. Sci., Indiana Univ., Bloomington, IN, USA
Abstract :
Random Forests have been used as effective ensemble models for classification. We present in this paper a new type of Random Forests (RFs) called Red(uced) RF that adopts a new dynamic data reduction principle and a new voting mechanism called Priority Vote Weighting (PV) which improve accuracy, execution time and AUC values compared to Breiman´s RF. Red-RF also shows that the strength of a random forest can increase without noticeably increasing correlation between the trees. We then compare performance of Red-RF and Breiman´s RF in 8 experiments that involve classification problems with datasets of different sizes. Finally, we conduct 2 additional experiments that involve considerably big datasets with one million points in each.
Keywords :
Big Data; data reduction; pattern classification; random processes; trees (mathematics); AUC values; Red-RF; big data; classification; dynamic data reduction; priority vote weighting; priority voting; reduced random forest; trees; Accuracy; Big data; Correlation; Diabetes; Prediction algorithms; Radio frequency; Vegetation; big data; classification; random forests; weighted voting;
Conference_Titel :
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location :
New York, NY
Print_ISBN :
978-1-4673-7277-0
DOI :
10.1109/BigDataCongress.2015.26