A privacy attack that removes the majority of the noise from perturbed data

Author

Sramka, Michal

Author_Institution

Dept. of Comput. Eng. & Math., Rovira i Virgili Univ., Tarragona, Spain

fYear

2010

fDate

18-23 July 2010

Firstpage

1

Lastpage

8

Abstract

Data perturbation is a sanitization method that helps restrict the disclosure of sensitive information from published data. We present an attack on the privacy of the published data that has been sanitized using data perturbation. The attack employs data mining to remove some noise from the perturbed sensitive values. Our attack is practical, can be launched by non-expert adversaries, and it does not require any background knowledge. Extensive experiments were performed on four databases derived from UCI´s Adult and IPUMS census-based data sets sanitized with noise addition that satisfies ε-differential privacy. The experimental results confirm that our attack presents a significant privacy risk to published perturbed data. The results show that up to 93% of the noise added during perturbation can be effectively removed using general-purpose data miners from the Weka software package. Interestingly, the higher the aimed privacy, the higher the percentage of noise can be removed. This suggests that adding more noise does not always increase the real privacy.

Keywords

data mining; data privacy; security of data; software packages; ε-differential privacy; Weka software package; data mining; data perturbation; privacy attack; published data privacy; sanitization method; Data privacy; Databases; Estimation; Noise; Prediction algorithms; Privacy;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks (IJCNN), The 2010 International Joint Conference on

Conference_Location

Barcelona

ISSN

1098-7576

Print_ISBN

978-1-4244-6916-1

Type

conf

DOI

10.1109/IJCNN.2010.5596527

Filename

5596527