DocumentCode
1665293
Title
Red-RF: Reduced Random Forest for Big Data Using Priority Voting & Dynamic Data Reduction
Author
Mohsen, Hussein ; Kurban, Hasan ; Zimmer, Kurt ; Jenne, Mark ; Dalkilic, Mehmet M.
Author_Institution
Dept. of Comput. Sci., Indiana Univ., Bloomington, IN, USA
fYear
2015
Firstpage
118
Lastpage
125
Abstract
Random Forests have been used as effective ensemble models for classification. We present in this paper a new type of Random Forests (RFs) called Red(uced) RF that adopts a new dynamic data reduction principle and a new voting mechanism called Priority Vote Weighting (PV) which improve accuracy, execution time and AUC values compared to Breiman´s RF. Red-RF also shows that the strength of a random forest can increase without noticeably increasing correlation between the trees. We then compare performance of Red-RF and Breiman´s RF in 8 experiments that involve classification problems with datasets of different sizes. Finally, we conduct 2 additional experiments that involve considerably big datasets with one million points in each.
Keywords
Big Data; data reduction; pattern classification; random processes; trees (mathematics); AUC values; Red-RF; big data; classification; dynamic data reduction; priority vote weighting; priority voting; reduced random forest; trees; Accuracy; Big data; Correlation; Diabetes; Prediction algorithms; Radio frequency; Vegetation; big data; classification; random forests; weighted voting;
fLanguage
English
Publisher
ieee
Conference_Titel
Big Data (BigData Congress), 2015 IEEE International Congress on
Conference_Location
New York, NY
Print_ISBN
978-1-4673-7277-0
Type
conf
DOI
10.1109/BigDataCongress.2015.26
Filename
7207210
Link To Document