• DocumentCode
    2364131
  • Title

    Robust prediction of fault-proneness by random forests

  • Author

    Guo, Lan ; Ma, Yan ; Cukic, Bojan ; Singh, Harshinder

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • fYear
    2004
  • fDate
    2-5 Nov. 2004
  • Firstpage
    417
  • Lastpage
    428
  • Abstract
    Accurate prediction of fault prone modules (a module is equivalent to a C function or a C+ + method) in software development process enables effective detection and identification of defects. Such prediction models are especially beneficial for large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a novel methodology for predicting fault prone modules, based on random forests. Random forests are an extension of decision tree learning. Instead of generating one decision tree, this methodology generates hundreds or even thousands of trees using subsets of the training data. Classification decision is obtained by voting. We applied random forests in five case studies based on NASA data sets. The prediction accuracy of the proposed methodology is generally higher than that achieved by logistic regression, discriminant analysis and the algorithms in two machine learning software packages, WEKA [I. H. Witten et al. (1999)] and See5. The difference in the performance of the proposed methodology over other methods is statistically significant. Further, the classification accuracy of random forests is more significant over other methods in larger data sets.
  • Keywords
    decision trees; software fault tolerance; software metrics; software quality; NASA data sets; decision tree learning; discriminant analysis; fault prone module; large scale system; logistic regression; machine learning; random forests; software development process; software packages; Decision trees; Fault detection; Fault diagnosis; Large-scale systems; NASA; Predictive models; Programming; Robustness; Training data; Voting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Reliability Engineering, 2004. ISSRE 2004. 15th International Symposium on
  • ISSN
    1071-9458
  • Print_ISBN
    0-7695-2215-7
  • Type

    conf

  • DOI
    10.1109/ISSRE.2004.35
  • Filename
    1383136