DocumentCode :
3250982
Title :
De-anonymizing private data by matching statistics
Author :
Unnikrishnan, Jayakrishnan ; Naini, Farid M.
Author_Institution :
Sch. of Comput. & Commun. Sci., Ecole Polytech. Fed. de Lausanne (EPFL), Lausanne, Switzerland
fYear :
2013
fDate :
2-4 Oct. 2013
Firstpage :
1616
Lastpage :
1623
Abstract :
Recent research has illustrated privacy breaches that can be effected on an anonymized dataset by an attacker who has access to auxiliary information about the users. Most of these attack strategies rely on the uniqueness of specific aspects of the users´ data - e.g., observing a mobile user at just a few points on the time-location space are sufficient to uniquely identify him/her from an anonymized set of users. In this work, we consider de-anonymization attacks on anonymized summary statistics in the form of histograms. Such summary statistics are useful for many applications that do not need knowledge about exact user behavior. We consider an attacker who has access to an anonymized set of histograms of K users´ data and an independent set of data belonging to the same users. Modeling the users´ data as i.i.d., we study the composite hypothesis testing problem of identifying the correct matching between the anonymized histograms from the first set and the user data from the second. We propose a Generalized Likelihood Ratio Test as a solution to this problem and show that the solution can be identified using a minimum weight matching algorithm on an K × K complete bipartite weighted graph. We show that a variant of this solution is asymptotically optimal as the data lengths are increased.We apply the algorithm on mobility traces of over 1000 users on EPFL campus collected during two weeks and show that up to 70% of the users can be correctly matched. These results show that anonymized summary statistics of mobility traces themselves contain a significant amount of information that can be used to uniquely identify users by an attacker who has access to auxiliary information about the statistics.
Keywords :
data privacy; graph theory; mobile computing; statistical testing; EPFL campus; anonymized histogram set access; anonymized summary statistics; anonymized user data set; asymptotic optimality; auxiliary information access; complete bipartite weighted graph; composite hypothesis testing problem; data lengths; generalized likelihood ratio test; histograms; independent data set; minimum weight matching algorithm; mobile user; mobility traces; privacy breach; private data de-anonymization attack strategies; time-location space; user data modeling; Accuracy; Bipartite graph; Databases; Histograms; Testing; Trajectory; Wireless communication;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Communication, Control, and Computing (Allerton), 2013 51st Annual Allerton Conference on
Conference_Location :
Monticello, IL
Print_ISBN :
978-1-4799-3409-6
Type :
conf
DOI :
10.1109/Allerton.2013.6736722
Filename :
6736722
Link To Document :
بازگشت