• DocumentCode
    1665293
  • Title

    Red-RF: Reduced Random Forest for Big Data Using Priority Voting & Dynamic Data Reduction

  • Author

    Mohsen, Hussein ; Kurban, Hasan ; Zimmer, Kurt ; Jenne, Mark ; Dalkilic, Mehmet M.

  • Author_Institution
    Dept. of Comput. Sci., Indiana Univ., Bloomington, IN, USA
  • fYear
    2015
  • Firstpage
    118
  • Lastpage
    125
  • Abstract
    Random Forests have been used as effective ensemble models for classification. We present in this paper a new type of Random Forests (RFs) called Red(uced) RF that adopts a new dynamic data reduction principle and a new voting mechanism called Priority Vote Weighting (PV) which improve accuracy, execution time and AUC values compared to Breiman´s RF. Red-RF also shows that the strength of a random forest can increase without noticeably increasing correlation between the trees. We then compare performance of Red-RF and Breiman´s RF in 8 experiments that involve classification problems with datasets of different sizes. Finally, we conduct 2 additional experiments that involve considerably big datasets with one million points in each.
  • Keywords
    Big Data; data reduction; pattern classification; random processes; trees (mathematics); AUC values; Red-RF; big data; classification; dynamic data reduction; priority vote weighting; priority voting; reduced random forest; trees; Accuracy; Big data; Correlation; Diabetes; Prediction algorithms; Radio frequency; Vegetation; big data; classification; random forests; weighted voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Big Data (BigData Congress), 2015 IEEE International Congress on
  • Conference_Location
    New York, NY
  • Print_ISBN
    978-1-4673-7277-0
  • Type

    conf

  • DOI
    10.1109/BigDataCongress.2015.26
  • Filename
    7207210