• DocumentCode
    1468660
  • Title

    Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

  • Author

    Kocaguneli, Ekrem ; Menzies, Tim ; Bener, Ayse Basar ; Keung, Jacky W.

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • Volume
    38
  • Issue
    2
  • fYear
    2012
  • Firstpage
    425
  • Lastpage
    438
  • Abstract
    Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees versus smaller subtrees. Results: For 10 data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: 1) The estimation variance of cluster subtrees is usually larger than that of cluster supertrees; 2) if analogy is restricted to the cluster trees with lower variance, then effort estimates have a significantly lower error (measured using MRE, AR, and Pred(25) with a Wilcoxon test, 95 percent confidence, compared to nearest neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.
  • Keywords
    pattern clustering; program testing; project management; software cost estimation; trees (mathematics); Albrecht data set; Coc81 data set; Desharnais data set; ISBSG data set; Nasa93 data set; Turkish companies; analogy-based effort estimation; binary cluster tree; cluster subtrees; dynamic selection; essential assumption; estimation variance; nearest neighbor selection; project data; software effort estimator design; subtree variance; supertree variance; Estimation; Euclidean distance; Humans; Linear regression; Software; Training; Training data; Software cost estimation; analogy; k-NN.;
  • fLanguage
    English
  • Journal_Title
    Software Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    0098-5589
  • Type

    jour

  • DOI
    10.1109/TSE.2011.27
  • Filename
    5728833