• DocumentCode
    637185
  • Title

    Data anonymization that leads to the most accurate estimates of statistical characteristics

  • Author

    Gang Xiang ; Kreinovich, Vladik

  • Author_Institution
    Appl. Biomath., Setauket, NY, USA
  • fYear
    2013
  • fDate
    16-19 April 2013
  • Firstpage
    163
  • Lastpage
    170
  • Abstract
    To preserve privacy, we divide the data space into boxes, and instead of original data points, only store the corresponding boxes. In accordance with the current practice, the desired level of privacy is established by having at least k different records in each box, for a given value k (the larger the value k, the higher the privacy level).When we process the data, then the use of boxes instead of the original exact values leads to uncertainty. In this paper, we find the (asymptotically) optimal subdivision of data into boxes, a subdivision that provides, for a given statistical characteristic like variance, covariance, or correlation, the smallest uncertainty within the given level of privacy. In areas where the empirical data density is small, boxes containing k points are large in size, which results in large uncertainty. To avoid this, we propose, when computing the corresponding characteristic, to only use data from boxes with a sufficiently large density. This deletion of data points increases the statistical uncertainty, but decreases the uncertainty caused by introducing the privacy-related boxes. We explain how to compute an (asymptotically) optimal threshold for which the overall uncertainty is (asymptotically) the smallest.
  • Keywords
    data privacy; statistical analysis; data anonymization; data space; empirical data density; privacy preservation; privacy-related boxes; statistical characteristics; statistical uncertainty; Accuracy; Computational intelligence; Correlation; Data privacy; Privacy; Uncertainty; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Intelligence for Engineering Solutions (CIES), 2013 IEEE Symposium on
  • Conference_Location
    Singapore
  • Type

    conf

  • DOI
    10.1109/CIES.2013.6611744
  • Filename
    6611744