DocumentCode :
1990525
Title :
Can We Trust Our Results? A Mapping Study on Data Quality
Author :
Rosli, Marshima Mohd ; Tempero, Ewan ; Luxton-Reilly, Andrew
Author_Institution :
Dept. of Comput. Sci., Univ. of Auckland, Auckland, New Zealand
Volume :
1
fYear :
2013
fDate :
2-5 Dec. 2013
Firstpage :
116
Lastpage :
123
Abstract :
Background: The quality of data sets used in software engineering research is of the utmost importance. To ensure credibility of results obtained from use of data sets, the quality of the data must be examined. Objective: This study provides an overview of recent research(2008-2012) involving data quality in software engineering datasets, with the goal of generally understanding what research there is that addresses data quality, and in particular to determine to what degree researchers have addressed any data quality issues in order to evaluate the trustworthiness of their results. Method: We performed a systematic mapping study to investigate treatment of data quality issues in software engineering research. A total of 64 papers published from 2008 to 2012explicitly address issues with the quality of data and use software engineering data sets. These studies were classified according to the data quality topic, data set and data quality problem. Results: We found only 31 studies gave serious consideration for how the quality of the data affected their results. We observed that there is a lack of clear and consistent terminology regarding data quality, especially with respect to the kinds of quality problems a data set might have. As a first step to address this problem, we propose a model that describes the lifecycle that research data goes through when used in research. Conclusions: The results suggest that researchers should give more attention to the quality of data sets in order to produce trustworthy data for reliable empirical research, and that the research community needs to better understand and communicate issues with data quality.
Keywords :
data handling; software engineering; trusted computing; data quality problem; data quality topic; data set; software engineering datasets; software engineering research; systematic mapping study; trustworthiness; trustworthy data; Cleaning; Context; Data collection; Manuals; Noise; Software engineering; Systematics; data quality; empirical studies; software engineering data sets; systematic mapping;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering Conference (APSEC), 2013 20th Asia-Pacific
Conference_Location :
Bangkok
ISSN :
1530-1362
Print_ISBN :
978-1-4799-2143-0
Type :
conf
DOI :
10.1109/APSEC.2013.26
Filename :
6805397
Link To Document :
بازگشت