DocumentCode :
1519624
Title :
Anonymous Publication of Sensitive Transactional Data
Author :
Ghinita, Gabriel ; Kalnis, Panos ; Tao, Yufei
Author_Institution :
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN, USA
Volume :
23
Issue :
2
fYear :
2011
Firstpage :
161
Lastpage :
174
Abstract :
Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to enforce privacy-preserving paradigms, such as k-anonymity and ℓ-diversity, while minimizing the information loss incurred in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing (LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1) reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead. The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.
Keywords :
Gray codes; cryptography; data privacy; pattern recognition; publishing; sorting; Gray encoding-based sorting; anonymous publication; band matrix; locality-sensitive hashing; nearest-neighbor search; privacy-preserving data publishing; sensitive transactional data; Data mining; Data processing; Nearest neighbor searches; Neural networks; Pregnancy test; Privacy; Publishing; Reflective binary codes; Sparse matrices; Privacy; anonymity; transactional data.;
fLanguage :
English
Journal_Title :
Knowledge and Data Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
1041-4347
Type :
jour
DOI :
10.1109/TKDE.2010.101
Filename :
5487522
Link To Document :
بازگشت