DocumentCode
2199704
Title
An efficient approach for data privacy in distributed environment using Nearest Neighbor Search Anonymization
Author
Madhuridevi, L. ; JesuVedhaNayahi, J. ; Kavitha, V.
Author_Institution
Anna Univ. of Technol., Tirunelveli, India
fYear
2012
fDate
19-21 April 2012
Firstpage
413
Lastpage
416
Abstract
Data mining is a technique for identifying patterns and trends from large collection of data. The collected data may contain personal information which may violate the privacy of individuals, which makes data mining a critical issue. Techniques available on hand in the field of privacy preserving data mining work well for relational data with fixed-schema, and low dimensionality. In this paper, an anonymization method for sparse high-dimensional transactional data is proposed. An anonymized group formation strategy is used which relies on efficient Nearest-Neighbor (NN) Search in high dimensional spaces. The problem of high dimensionality is addressed by anonymizing each group of transaction according to relevant Quasi Identifiers (QID). The privacy requirement is fulfilled by partitioning the transactional dataset into disjoint sets of transactions, referred as anonymized groups. These groups contain QIDs and the frequencies of sensitive items. The proposed NN search algorithm maximizes the quality of each individual group and can be used for sparse high-dimensional data. On the other hand, the number of groups formed is proportional to number of sensitive item, which paves way for inference attack. Hence to overcome this problem, anonymization can be integrated with anatomization, where the same data can be published as two distinct tables, the quasi identifier table and the sensitive table. This enhancement would prevent inference attack, which is the major drawback of NN search algorithm.
Keywords
data mining; data privacy; pattern clustering; search problems; NN search algorithm; QID; anatomization; anonymization method; anonymized group formation strategy; data collection; disjoint transaction sets; high dimensional spaces; inference attack; nearest neighbor search algorithm; personal information; privacy preserving data mining; privacy requirement; quasi identifier table; relational data; sensitive table; sparse high-dimensional transactional data; transactional dataset partitioning; Accuracy; Classification algorithms; Data privacy; Educational institutions; Nearest neighbor searches; Privacy; Anatomization; Inference attack; Linking attack; Quasi Identifiers;
fLanguage
English
Publisher
ieee
Conference_Titel
Recent Trends In Information Technology (ICRTIT), 2012 International Conference on
Conference_Location
Chennai, Tamil Nadu
Print_ISBN
978-1-4673-1599-9
Type
conf
DOI
10.1109/ICRTIT.2012.6206786
Filename
6206786
Link To Document