Title :
Ensemble imputation methods for missing software engineering data
Author :
Twala, Bhekisipho ; Cartwright, Michelle
Author_Institution :
Brunei Univ., Manchester
Abstract :
One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method
Keywords :
Bayes methods; decision trees; learning (artificial intelligence); software cost estimation; software metrics; classifier learning; data handling; decision trees; ensemble Bayesian multiple imputation method; ensemble imputation methods; machine learning; nearest neighbour single imputation method; prediction systems; software development; software engineering; software prediction; Accuracy; Bayesian methods; Decision trees; Industrial training; Machine learning; Programming; Robustness; Software engineering; Software quality; Statistics; Machine learning; decision trees; ensemble; imputation; incomplete data; software prediction;
Conference_Titel :
Software Metrics, 2005. 11th IEEE International Symposium
Conference_Location :
Como
Print_ISBN :
0-7695-2371-4
DOI :
10.1109/METRICS.2005.21