مرکز منطقه ای اطلاع رساني علوم و فناوري - Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods

DocumentCode :

1549613

Title :

Analyzing data sets with missing data: an empirical evaluation of imputation methods and likelihood-based methods

Author :

Myrtveit, Ingunn ; Stensrud, Erik ; Olsson, Ulf H.

Author_Institution :

Norwegian Sch. of Manage., Sandvika, Norway

Volume :

Issue :

fYear :

2001

fDate :

11/1/2001 12:00:00 AM

Firstpage :

999

Lastpage :

1013

Abstract :

Missing data are often encountered in data sets used to construct software effort prediction models. Thus far, the common practice has been to ignore observations with missing data. This may result in biased prediction models. The authors evaluate four missing data techniques (MDTs) in the context of software cost modeling: listwise deletion (LD), mean imputation (MI), similar response pattern imputation (SRPI), and full information maximum likelihood (FIML). We apply the MDTs to an ERP data set, and thereafter construct regression-based prediction models using the resulting data sets. The evaluation suggests that only FIML is appropriate when the data are not missing completely at random (MCAR). Unlike FIML, prediction models constructed on LD, MI and SRPI data sets will be biased unless the data are MCAR. Furthermore, compared to LD, MI and SRPI seem appropriate only if the resulting LD data set is too small to enable the construction of a meaningful regression-based prediction model

Keywords :

data analysis; maximum likelihood estimation; software cost estimation; statistical analysis; ERP data set; FIML; LD; MCAR; MDTs; MI; SRPI data sets; biased prediction models; data set analysis; information maximum likelihood; listwise deletion; mean imputation; missing completely at random; missing data techniques; regression-based prediction model; regression-based prediction models; similar response pattern imputation; software cost modeling; software effort prediction models; Context modeling; Costs; Data analysis; Databases; Enterprise resource planning; Information analysis; Maximum likelihood estimation; Predictive models; Software engineering; Software standards;

fLanguage :

English

Journal_Title :

Software Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

0098-5589

Type :

jour

DOI :

10.1109/32.965340

Filename :

965340

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1549613