• DocumentCode
    1255982
  • Title

    Data Mining Techniques for Software Effort Estimation: A Comparative Study

  • Author

    Dejaeger, Karel ; Verbeke, Wouter ; Martens, David ; Baesens, Bart

  • Author_Institution
    Dept. of Decision Sci. & Inf. Manage., Katholieke Univ. Leuven, Leuven, Belgium
  • Volume
    38
  • Issue
    2
  • fYear
    2012
  • Firstpage
    375
  • Lastpage
    397
  • Abstract
    A predictive model is required to be accurate and comprehensible in order to inspire confidence in a business setting. Both aspects have been assessed in a software effort estimation setting by previous studies. However, no univocal conclusion as to which technique is the most suited has been reached. This study addresses this issue by reporting on the results of a large scale benchmarking study. Different types of techniques are under consideration, including techniques inducing tree/rule-based models like M5 and CART, linear models such as various types of linear regression, nonlinear models (MARS, multilayered perceptron neural networks, radial basis function networks, and least squares support vector machines), and estimation techniques that do not explicitly induce a model (e.g., a case-based reasoning approach). Furthermore, the aspect of feature subset selection by using a generic backward input selection wrapper is investigated. The results are subjected to rigorous statistical testing and indicate that ordinary least squares regression in combination with a logarithmic transformation performs best. Another key finding is that by selecting a subset of highly predictive attributes such as project size, development, and environment related attributes, typically a significant increase in estimation accuracy can be obtained.
  • Keywords
    data mining; program testing; regression analysis; software cost estimation; CART; M5; data mining techniques; estimation techniques; feature subset selection; generic backward input selection wrapper; linear regression; logarithmic transformation; nonlinear models; ordinary least squares regression; predictive model; rigorous statistical testing; rule-based models; software effort estimation; Artificial neural networks; Cognition; Data mining; Data models; Estimation; Regression tree analysis; Software; Data mining; regression.; software effort estimation;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2011.55
  • Filename
    5928350