• DocumentCode
    1048159
  • Title

    Privacy: a machine learning view

  • Author

    Vinterbo, Staal A.

  • Author_Institution
    Decision Syst. Group, Brigham & Women´´s Hosp., Boston, MA, USA
  • Volume
    16
  • Issue
    8
  • fYear
    2004
  • Firstpage
    939
  • Lastpage
    948
  • Abstract
    The problem of disseminating a data set for machine learning while controlling the disclosure of data source identity is described using a commuting diagram of functions. This formalization is used to present and analyze an optimization problem balancing privacy and data utility requirements. The analysis points to the application of a generalization mechanism for maintaining privacy in view of machine learning needs. We present new proofs of NP-hardness of the problem of minimizing information loss while satisfying a set of privacy requirements, both with and without the addition of a particular uniform coding requirement. As an initial analysis of the approximation properties of the problem, we show that the cell suppression problem with a constant number of attributes can be approximated within a constant. As a side effect, proofs of NP-hardness of the minimum k-union, maximum k-intersection, and parallel versions of these are presented. Bounded versions of these problems are also shown to be approximable within a constant.
  • Keywords
    computational complexity; data privacy; generalisation (artificial intelligence); inference mechanisms; learning (artificial intelligence); optimisation; NP-hardness; cell suppression problem; data set; data source identity; data utility requirements; generalization mechanism; machine learning; optimization problem; Aggregates; Control systems; Data privacy; Database systems; Helium; Humans; Insurance; Machine learning; Protection; US Government; 65; Privacy; approximation properties; combinatorial optimization; complexity; disclosure control; machine learning.;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2004.31
  • Filename
    1318579