Title :
An efficient approach for data privacy in distributed environment using Nearest Neighbor Search Anonymization
Author :
Madhuridevi, L. ; JesuVedhaNayahi, J. ; Kavitha, V.
Author_Institution :
Anna Univ. of Technol., Tirunelveli, India
Abstract :
Data mining is a technique for identifying patterns and trends from large collection of data. The collected data may contain personal information which may violate the privacy of individuals, which makes data mining a critical issue. Techniques available on hand in the field of privacy preserving data mining work well for relational data with fixed-schema, and low dimensionality. In this paper, an anonymization method for sparse high-dimensional transactional data is proposed. An anonymized group formation strategy is used which relies on efficient Nearest-Neighbor (NN) Search in high dimensional spaces. The problem of high dimensionality is addressed by anonymizing each group of transaction according to relevant Quasi Identifiers (QID). The privacy requirement is fulfilled by partitioning the transactional dataset into disjoint sets of transactions, referred as anonymized groups. These groups contain QIDs and the frequencies of sensitive items. The proposed NN search algorithm maximizes the quality of each individual group and can be used for sparse high-dimensional data. On the other hand, the number of groups formed is proportional to number of sensitive item, which paves way for inference attack. Hence to overcome this problem, anonymization can be integrated with anatomization, where the same data can be published as two distinct tables, the quasi identifier table and the sensitive table. This enhancement would prevent inference attack, which is the major drawback of NN search algorithm.
Keywords :
data mining; data privacy; pattern clustering; search problems; NN search algorithm; QID; anatomization; anonymization method; anonymized group formation strategy; data collection; disjoint transaction sets; high dimensional spaces; inference attack; nearest neighbor search algorithm; personal information; privacy preserving data mining; privacy requirement; quasi identifier table; relational data; sensitive table; sparse high-dimensional transactional data; transactional dataset partitioning; Accuracy; Classification algorithms; Data privacy; Educational institutions; Nearest neighbor searches; Privacy; Anatomization; Inference attack; Linking attack; Quasi Identifiers;
Conference_Titel :
Recent Trends In Information Technology (ICRTIT), 2012 International Conference on
Conference_Location :
Chennai, Tamil Nadu
Print_ISBN :
978-1-4673-1599-9
DOI :
10.1109/ICRTIT.2012.6206786