• DocumentCode
    600261
  • Title

    Handling categorical variables in effort estimation

  • Author

    Tsunoda, Masafumi ; Amasaki, Sousuke ; Monden, Akito

  • Author_Institution
    Toyo Univ., Kawagoe, Japan
  • fYear
    2012
  • fDate
    20-21 Sept. 2012
  • Firstpage
    99
  • Lastpage
    102
  • Abstract
    Background: Accurate effort estimation is the basis of the software development project management. The linear regression model is one of the widely-used methods for the purpose. A dataset used to build a model often includes categorical variables denoting such as programming languages. Categorical variables are usually handled with two methods: the stratification and dummy variables. Those methods have a positive effect on accuracy but have shortcomings. The other handing method, the interaction and the hierarchical linear model (HLM), might be able to compensate for them. However, the two methods have not been examined in the research area. Aim: giving useful suggestions for handling categorical variables with the stratification, transforming dummy variables, the interaction, or HLM, when building an estimation model. Method: We built estimation models with the four handling methods on ISBSG, NASA, and Desharnais datasets, and compared accuracy of the methods with each other. Results: The most effective method was different for datasets, and the difference was statistically significant on both mean balanced relative error (MBRE) and mean magnitude of relative error (MMRE). The interaction and HLM were effective in a certain case. Conclusions: The stratification and transforming dummy variables should be tried at least, for obtaining an accurate model. In addition, we suggest that the application of the interaction and HLM should be considered when building the estimation model.
  • Keywords
    project management; regression analysis; software development management; Desharnais datasets; HLM; ISBSG datasets; MBRE; MMRE; NASA datasets; categorical variable handling; dummy variables; effort estimation; estimation model; hierarchical linear model; linear regression model; mean balanced relative error; mean magnitude of relative error; programming languages; software development project management; Accuracy; Buildings; Estimation; Linear regression; Mathematical model; NASA; Software; Model-based effort estimation; dummy variable; hierarchical linear model; interaction; mixed effects; stratification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Empirical Software Engineering and Measurement (ESEM), 2012 ACM-IEEE International Symposium on
  • Conference_Location
    Lund
  • ISSN
    1938-6451
  • Print_ISBN
    978-1-4503-1056-7
  • Electronic_ISBN
    1938-6451
  • Type

    conf

  • DOI
    10.1145/2372251.2372267
  • Filename
    6475401