• DocumentCode
    2025109
  • Title

    Ensemble imputation methods for missing software engineering data

  • Author

    Twala, Bhekisipho ; Cartwright, Michelle

  • Author_Institution
    Brunei Univ., Manchester
  • fYear
    2005
  • fDate
    1-1 Sept. 2005
  • Lastpage
    30
  • Abstract
    One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method
  • Keywords
    Bayes methods; decision trees; learning (artificial intelligence); software cost estimation; software metrics; classifier learning; data handling; decision trees; ensemble Bayesian multiple imputation method; ensemble imputation methods; machine learning; nearest neighbour single imputation method; prediction systems; software development; software engineering; software prediction; Accuracy; Bayesian methods; Decision trees; Industrial training; Machine learning; Programming; Robustness; Software engineering; Software quality; Statistics; Machine learning; decision trees; ensemble; imputation; incomplete data; software prediction;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Software Metrics, 2005. 11th IEEE International Symposium
  • Conference_Location
    Como
  • ISSN
    1530-1435
  • Print_ISBN
    0-7695-2371-4
  • Type

    conf

  • DOI
    10.1109/METRICS.2005.21
  • Filename
    1509308