DocumentCode
2443447
Title
Privacy and utility for defect prediction: Experiments with MORPH
Author
Peters, Fayola ; Menzies, Tim
Author_Institution
Lane Dept. of Comput. Sci. & Electr. Eng., West Virginia Univ., Morgantown, WV, USA
fYear
2012
fDate
2-9 June 2012
Firstpage
189
Lastpage
199
Abstract
Ideally, we can learn lessons from software projects across multiple organizations. However, a major impediment to such knowledge sharing are the privacy concerns of software development organizations. This paper aims to provide defect data-set owners with an effective means of privatizing their data prior to release. We explore MORPH which understands how to maintain class boundaries in a data-set. MORPH is a data mutator that moves the data a random distance, taking care not to cross class boundaries. The value of training on this MORPHed data is tested via a 10-way within learning study and a cross learning study using Random Forests, Naive Bayes, and Logistic Regression for ten object-oriented defect datasets from the PROMISE data repository. Measured in terms of exposure of sensitive attributes, the MORPHed data was four times more private than the unMORPHed data. Also, in terms of the f-measures, there was little difference between the MORPHed and unMORPHed data (original data and data privatized by data-swapping) for both the cross and within study. We conclude that at least for the kinds of OO defect data studied in this project, data can be privatized without concerns for inference efficacy.
Keywords
Bayes methods; data privacy; object-oriented programming; regression analysis; software engineering; trees (mathematics); MORPH; PROMISE data repository; class boundaries; data mutator; defect prediction; f-measures; knowledge sharing; logistic regression; naive Bayes; object-oriented defect datasets; privacy concerns; random distance; random forests; software development organizations; software projects; Companies; Data privacy; Predictive models; Privacy; Privatization; Software; data mining; defect prediction; privacy;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering (ICSE), 2012 34th International Conference on
Conference_Location
Zurich
ISSN
0270-5257
Print_ISBN
978-1-4673-1066-6
Electronic_ISBN
0270-5257
Type
conf
DOI
10.1109/ICSE.2012.6227194
Filename
6227194
Link To Document