• DocumentCode
    2514244
  • Title

    How to Find Relevant Data for Effort Estimation?

  • Author

    Kocaguneli, Ekrem ; Menzies, Tim

  • Author_Institution
    Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
  • fYear
    2011
  • fDate
    22-23 Sept. 2011
  • Firstpage
    255
  • Lastpage
    264
  • Abstract
    Background: Building effort estimators requires the training data. How can we find that data? It is tempting to cross the boundaries of development type, location, language, application and hardware to use existing datasets of other organizations. However, prior results caution that using such cross data may not be useful. Aim: We test two conjectures: (1) instance selection can automatically prune irrelevant instances and (2) retrieval from the remaining examples is useful for effort estimation, regardless of their source. Method: We selected 8 cross-within divisions (21 pairs of within-cross subsets) out of 19 datasets and evaluated these divisions under different analogy-based estimation (ABE) methods. Results: Between the within & cross experiments, there were few statistically significant differences in (i) the performance of effort estimators, or (ii) the amount of instances retrieved for estimation. Conclusion: For the purposes of effort estimation, there is little practical difference between cross and within data. After applying instance selection, the remaining examples (be they from within or from cross source divisions) can be used for effort estimation.
  • Keywords
    data handling; estimation theory; information retrieval; ABE; analogy based estimation; cross source divisions; effort estimation; information retrieval; instance selection; relevant data; Artificial neural networks; Buildings; Data models; Estimation; Organizations; Training; Training data; cross resource; k-NN; software cost estimation; within resource;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Empirical Software Engineering and Measurement (ESEM), 2011 International Symposium on
  • Conference_Location
    Banff, AB
  • ISSN
    1938-6451
  • Print_ISBN
    978-1-4577-2203-5
  • Type

    conf

  • DOI
    10.1109/ESEM.2011.34
  • Filename
    6092574