• DocumentCode
    3059687
  • Title

    Estimating class probabilities in random forests

  • Author

    Boström, Henrik

  • Author_Institution
    Univ. of Skovde, Skovde
  • fYear
    2007
  • fDate
    13-15 Dec. 2007
  • Firstpage
    211
  • Lastpage
    216
  • Abstract
    For both single probability estimation trees (PETs) and ensembles of such trees, commonly employed class probability estimates correct the observed relative class frequencies in each leaf to avoid anomalies caused by small sample sizes. The effect of such corrections in random forests of PETs is investigated, and the use of the relative class frequency is compared to using two corrected estimates, the Laplace estimate and the m-estimate. An experiment with 34 datasets from the UCI repository shows that estimating class probabilities using relative class frequency clearly outperforms both using the Laplace estimate and the m-estimate with respect to accuracy, area under the ROC curve (AUC) and Brier score. Hence, in contrast to what is commonly employed for PETs and ensembles of PETs, these results strongly suggest that a non-corrected probability estimate should be used in random forests of PETs. The experiment further shows that learning random forests of PETs using relative class frequency significantly outperforms learning random forests of classification trees (i.e., trees for which only an unweighted vote on the most probable class is counted) with respect to both accuracy and AUC, but that the latter is clearly ahead of the former with respect to Brier score.
  • Keywords
    probability; trees (mathematics); Brier score; Laplace estimate; ROC curve; class probabilities; m-estimate; observed relative class frequencies; random forests; single probability estimation trees; Area measurement; Classification tree analysis; Frequency estimation; Informatics; Machine learning; Positron emission tomography; Probability distribution; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Applications, 2007. ICMLA 2007. Sixth International Conference on
  • Conference_Location
    Cincinnati, OH
  • Print_ISBN
    978-0-7695-3069-7
  • Type

    conf

  • DOI
    10.1109/ICMLA.2007.64
  • Filename
    4457233