• DocumentCode
    2412034
  • Title

    Measuring Disclosure Risk for Multimethod Synthetic Data Generation

  • Author

    Larsen, Michael D. ; Huckett, Jennifer C.

  • Author_Institution
    Dept. of Stat., George Washington Univ., Washington, DC, USA
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Firstpage
    808
  • Lastpage
    815
  • Abstract
    Government agencies must simultaneously maintain confidentiality of individual records and disseminate useful microdata. We propose a method to create synthetic data that combines quantile regression, hot deck imputation, and rank swapping. The result from implementation of the proposed procedure is a releasable data set containing original values for a few key variables, synthetic quantile regression predictions for several variables, and imputed and perturbed values for remaining variables. To measure the disclosure risk in the resulting synthetic data set, we extend existing probabilistic risk measures that aim to imitate an intruder attempting to match a record in the released data with information previously available on a target respondent.
  • Keywords
    government data processing; regression analysis; risk analysis; security of data; disclosure risk measurement; government agencies; hot deck imputation; multimethod synthetic data generation; probabilistic risk measures; rank swapping; synthetic quantile regression; Biological system modeling; Computational modeling; Data models; Equations; Joints; Mathematical model; Predictive models; hot deck imputation; quantile regression; rank swapping; statistical disclosure limitation; synthetic data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Social Computing (SocialCom), 2010 IEEE Second International Conference on
  • Conference_Location
    Minneapolis, MN
  • Print_ISBN
    978-1-4244-8439-3
  • Electronic_ISBN
    978-0-7695-4211-9
  • Type

    conf

  • DOI
    10.1109/SocialCom.2010.123
  • Filename
    5591463