مرکز منطقه ای اطلاع رساني علوم و فناوري - (l1, ..., lq)-diversity for Anonymizing Sensitive Quasi-Identifiers

Abstract :

A lot of studies of privacy-preserving data mining have been proposed. Most of them assume that they can separate quasi-identifiers (QIDs) from sensitive attributes. For instance, they assume that address, job, and age are QIDs but not sensitive attributes, and that a disease name is a sensitive attribute but not a QID. However, all of these attributes can have features that are both sensitive attributes and QIDs depending on the persons in practice. In this paper, we refer to these attributes as sensitive QIDs, and we propose a novel privacy definition (l1, ..., lq)-diversity and a method that can treat sensitive QIDs. Our method is composed of two algorithms: an anonymization algorithm and a reconstruction algorithm. The anonymization algorithm, which is conducted by data holders, is simple but effective, whereas the reconstruction algorithm, which is conducted by data users, can be conducted according to each data user´s objective. Our proposed method is experimentally evaluated using real datasets.