DocumentCode
2412034
Title
Measuring Disclosure Risk for Multimethod Synthetic Data Generation
Author
Larsen, Michael D. ; Huckett, Jennifer C.
Author_Institution
Dept. of Stat., George Washington Univ., Washington, DC, USA
fYear
2010
fDate
20-22 Aug. 2010
Firstpage
808
Lastpage
815
Abstract
Government agencies must simultaneously maintain confidentiality of individual records and disseminate useful microdata. We propose a method to create synthetic data that combines quantile regression, hot deck imputation, and rank swapping. The result from implementation of the proposed procedure is a releasable data set containing original values for a few key variables, synthetic quantile regression predictions for several variables, and imputed and perturbed values for remaining variables. To measure the disclosure risk in the resulting synthetic data set, we extend existing probabilistic risk measures that aim to imitate an intruder attempting to match a record in the released data with information previously available on a target respondent.
Keywords
government data processing; regression analysis; risk analysis; security of data; disclosure risk measurement; government agencies; hot deck imputation; multimethod synthetic data generation; probabilistic risk measures; rank swapping; synthetic quantile regression; Biological system modeling; Computational modeling; Data models; Equations; Joints; Mathematical model; Predictive models; hot deck imputation; quantile regression; rank swapping; statistical disclosure limitation; synthetic data;
fLanguage
English
Publisher
ieee
Conference_Titel
Social Computing (SocialCom), 2010 IEEE Second International Conference on
Conference_Location
Minneapolis, MN
Print_ISBN
978-1-4244-8439-3
Electronic_ISBN
978-0-7695-4211-9
Type
conf
DOI
10.1109/SocialCom.2010.123
Filename
5591463
Link To Document