• DocumentCode
    1343608
  • Title

    A Genetic Approach to Statistical Disclosure Control

  • Author

    Smith, Jim E. ; Clark, Alistair R. ; Staggemeier, Andrea T. ; Serpell, Martin C.

  • Author_Institution
    Univ. of the West of England, Bristol, UK
  • Volume
    16
  • Issue
    3
  • fYear
    2012
  • fDate
    6/1/2012 12:00:00 AM
  • Firstpage
    431
  • Lastpage
    441
  • Abstract
    Statistical disclosure control is the collective name for a range of tools used by data providers such as government departments to protect the confidentiality of individuals or organizations. When the published tables contain magnitude data such as turnover or health statistics, the preferred method is to suppress the values of certain cells. Assigning a cost to the information lost by suppressing any given cell creates the “cell suppression problem.” This consists of finding the minimum cost solution which meets the confidentiality constraints. Solving this problem simultaneously for all of the sensitive cells in a table is NP-hard and not possible for medium to large sized tables. In this paper, we describe the development of a heuristic tool for this problem which hybridizes linear programming (to solve a relaxed version for a single sensitive cell) with a genetic algorithm (to seek an order for considering the sensitive cells which minimizes the final cost). Considering a range of real-world and representative “artificial” datasets, we show that the method is able to provide relatively low cost solutions for far larger tables than is possible for the optimal approach to tackle. We show that our genetic approach is able to significantly improve on the initial solutions provided by existing heuristics for cell ordering, and outperforms local search. This approach is then extended and applied to large statistical tables with over 200000 cells.
  • Keywords
    data privacy; genetic algorithms; linear programming; search problems; statistical analysis; NP-hardness; cell suppression problem; confidentiality protection; data providers; genetic algorithm; genetic approach; government departments; health statistics; linear programming; sensitive cells; statistical disclosure control; statistical tables; Algorithm design and analysis; Analysis of variance; Equations; Genetic algorithms; Genetics; Linear programming; Perturbation methods; Statistical disclosure control;
  • fLanguage
    English
  • Journal_Title
    Evolutionary Computation, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1089-778X
  • Type

    jour

  • DOI
    10.1109/TEVC.2011.2159271
  • Filename
    6036172