مرکز منطقه ای اطلاع رساني علوم و فناوري - Comparison of various methods for handling incomplete data in software engineering databases

DocumentCode :

2548793

Title :

Comparison of various methods for handling incomplete data in software engineering databases

Author :

Twala, Bhekisipho ; Cartwright, Michelle ; Shepperd, Martin

Author_Institution :

Brunel Univ., Uxbridge, UK

fYear :

2005

fDate :

17-18 Nov. 2005

Abstract :

Increasing the awareness of how missing data affects software predictive accuracy has led to increasing numbers of missing data techniques (MDTs). This paper investigates the robustness and accuracy of eight popular techniques for tolerating incomplete training and test data using tree-based models. MDTs were compared by artificially simulating different proportions, patterns, and mechanisms of missing data. A 4-way repeated measures design was employed to analyze the data. The simulation results suggest important differences. Listwise deletion is substantially inferior while multiple imputation (MI) represents a superior approach to handling missing data. Decision tree single imputation and surrogate variables splitting are more severely impacted by missing values distributed among all attributes. MI should be used if the data contain many missing values. If few values are missing, any of the MDTs might be considered. Choice of technique should be guided by pattern and mechanisms of missing data.

Keywords :

data handling; database management systems; decision trees; software fault tolerance; software performance evaluation; data handling; decision tree; imputation representation; missing data technique; software engineering database; software predictive accuracy; tree-based model; Accuracy; Data analysis; Databases; Decision trees; Machine learning; Machine learning algorithms; Robustness; Software engineering; Software quality; Testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Empirical Software Engineering, 2005. 2005 International Symposium on

Print_ISBN :

0-7803-9507-7

Type :

conf

DOI :

10.1109/ISESE.2005.1541819

Filename :

1541819

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2548793